kubesphere 部署 promethues-白眉大叔

kubesphere 部署 promethues

PrometheusAlert+prometheus+Alertmanager实现各种类型告警（企业微信告警、飞书告警、钉钉告警、）

https://blog.csdn.net/W1124824402/article/details/128846493、

prometheu是有状态的，因为要保存时序数据库

1- 镜像

bitnami/prometheus  # 不能挂载数据，所以pass

prom/prometheus:v2.34.0

可以把数据path 挂载 /prometheus

先不配置存储卷和字典，走低2步第3步。

2- 配置存储卷

prometheus-db

3- 配置 configmap -配置字典

prometheus-yml

这里要注意，因为镜像原因，一些其他的报警规则，我也写在这里边了。方便实用。

prometheus.yml 内容

global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093
          - 10.0.0.201:31007

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "/etc/prometheus/*_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "mysql-exporter"
    static_configs:
      - targets: ["10.0.0.201:31004"] 
  - job_name: "node-exporter"
    static_configs:
      - targets: ["10.0.0.201:31003"] 
  - job_name: "nginx-exporter"
    static_configs:
      - targets: ["10.0.0.201:31005"] 

  - job_name: "tomcat-exporter"
    static_configs:
      - targets: ["10.0.0.1:8080"]

  - job_name: "es-exporter"
    static_configs:
      - targets: ["10.0.0.201:31006"]

  - job_name: "baimei-node-exporter"
    static_configs:
      - targets:
          - "10.0.0.205:9100"
          - "10.0.0.207:9100"

4- 配置存储卷和配置字典

（1）prometheus.yml 配置挂载

/etc/prometheus/prometheus.yml
prometheus.yml

（2）报警规则文件配置

/etc/prometheus/mysql_rules.yml


mysql_rules.yml

(3) 存储卷

/prometheus

检测

http://10.0.0.201:31010/alerts

mysql_rules.yml

groups:
- name: MySQLStatsAlert
  rules:
  - alert: MySQL is down
    expr: mysql_up == 0
    for: 1m
    labels:
        severity: critical
    annotations:
        summary: "Instance {{ $labels.instance }} MySQL is down"
        description: "MySQL database is down. This requires immediate action!"

  - alert: Mysql_High_QPS
    expr: rate(mysql_global_status_questions[5m]) > 500 
    for: 2m
    labels:
        severity: warning
    annotations:
        summary: "{{$labels.instance}}: Mysql_High_QPS detected"
        description: "{{$labels.instance}}: Mysql opreation is more than 500 per second ,(current value is: {{ $value }})"  
  - alert: Mysql_Too_Many_Connections
    expr: rate(mysql_global_status_threads_connected[5m]) > 200
    for: 2m
    labels:
        severity: warning
    annotations:
        summary: "{{$labels.instance}}: Mysql Too Many Connections detected"
        description: "{{$labels.instance}}: Mysql Connections is more than 100 per second ,(current value is: {{ $value }})"  

  - alert: Mysql_Too_Many_slow_queries
    expr: rate(mysql_global_status_slow_queries[5m]) > 3
    for: 2m
    labels:
        severity: warning
    annotations:
        summary: "{{$labels.instance}}: Mysql_Too_Many_slow_queries detected"
        description: "{{$labels.instance}}: Mysql slow_queries is more than 3 per second ,(current value is: {{ $value }})"  

  - alert: SQL thread stopped
    expr: mysql_slave_status_slave_sql_running != 1
    for: 1m
    labels:
        severity: critical
    annotations:
        summary: "Instance {{ $labels.instance }} Sync Binlog is enabled"
        description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
  - alert: Slave lagging behind Master
    expr: rate(mysql_slave_status_seconds_behind_master[5m]) >30 
    for: 1m
    labels:
        severity: warning 
    annotations:
        summary: "Instance {{ $labels.instance }} Slave lagging behind Master"
        description: "Slave is lagging behind Master. Please check if Slave threads are running and if there are some performance issues!"

参考点：

欢迎来撩：汇总all

关于白眉大叔linux云计算: 白眉大叔

相关文章

K8S指定calico使用的网卡名称

k8s 中获取所有ImagePullBackOff 的容器信息，包含镜像名称+所在节点

KubeSphere 中某个容器日志不正常采集解决方案

热门文章

1联想笔记本-insydeh20 setup utility怎么设置硬盘启动项

2dify_配置火山账号-火山的 Endpoint ID 在哪里找（豆包大模型）

3VMWare怎么开启GPU虚拟化

4Milvus 向量型数据库（云原生）Milvus demo

5pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available