Prometheus部署及监控告警配置参考上文:k8s之部署Prometheus监控平台并实现监控告警

1. 概述

本文采用helm安装数据库&中间件的exporter,并通过配置alertmanager及告警规则监控各组件的状态,并实现邮件报警。其中所采用的helm仓库及chart包如下所示:

  • helm仓库:
prometheus-community: https://prometheus-community.github.io/helm-charts
  • chart包:

下载无反应可尝试重试多次

prometheus-community/prometheus-mysql-exporter
prometheus-community/prometheus-redis-exporter
prometheus-community/prometheus-kafka-exporter
prometheus-community/prometheus-rabbitmq-exporter

2. 监控Mysql

2.1. 部署Mysql(单机版示例)

kubectl create ns test

vim mysql-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
  namespace: test
spec:
  selector: 
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - image: mysql
        name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: root@mysql
---
apiVersion: v1
kind: Service
metadata:
  name: mysql
  namespace: test
spec:
  type: ClusterIP
  ports:
  - port: 3306
    targetPort: 3306
  selector:
    app: mysql
kubectl apply -f mysql-deploy.yaml
  • 最终信息如下
host: mysql.test
port: 3306
user: root
pass: root@mysql

2.2. 部署mysql-exporter

2.2.1. 下载并解压mysql-exporter安装包
cd  ~/workspace/prometheus/
helm pull prometheus-community/prometheus-mysql-exporter
tar zvcf [xxx.tgz]
2.2.2. 配置values.yaml
cd  ~/workspace/prometheus/prometheus-mysql-exporter
vim values.yaml
2.2.3. 设置mysql连接

参考上节中mysql的连接信息

mysql:
  db: ""
  host: "mysql.test"
  param: ""
  pass: "root@mysql"
  port: 3306
  protocol: ""
  user: "root"

prometheus监控mysql没数据 prometheus监控报表_监控程序

2.2.4. 部署mysql-exporter
helm install prometheus-mysql-exporter -n prometheus .

多实例监控:部署多个exporter即可(注意区分helm-NAME)

  • 在prometheus-server面板中查看Target

prometheus监控mysql没数据 prometheus监控报表_中间件_02

  • 查看mysql-exporter采集的信息

prometheus监控mysql没数据 prometheus监控报表_redis_03

2.3. 配置Grafana-Dashboard

prometheus监控mysql没数据 prometheus监控报表_中间件_04

2.4. 告警规则

告警规则可以参考该监控面板配置,示例如下:

2.4.1. Mysql状态
mysql_up == 0
2.4.2. 打开文件数量偏高
mysql_global_status_innodb_num_open_files / mysql_global_variables_open_files_limit > 0.75
2.4.3. 当前连接数超过最大限制的75%
max_over_time(mysql_global_status_threads_connected[5m]) / mysql_global_variables_max_connections > 0.75
2.4.4. 历史最大连接数超过最大限制的75%
mysql_global_status_max_used_connections / mysql_global_variables_max_connections > 0.75
2.4.5. 慢查询过多
rate(mysql_global_status_slow_queries[5m])>3

3. 监控Redis

3.1. 部署Redis(单机版示例)

可参考之前的文章:k8s之安装单点Redis+NFS持久化+数据迁移

vim redis-deploy.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: redis
  namespace: test
data:
  redis.conf: |+
    requirepass redis@passwd
    maxmemory 268435456
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: test
  labels:
    app: redis
spec:
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
      annotations:
        version/date: "20210814"
        version/author: "lc"
    spec:
      containers:
      - name: redis
        image: redis
        imagePullPolicy: Always
        command: ["redis-server","/etc/redis/redis.conf"]
        ports:
        - containerPort: 6379
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis/redis.conf
          subPath: redis.conf
      volumes:
      - name: redis-config
        configMap:
          name: redis
          items:
          - key: redis.conf
            path: redis.conf
---
kind: Service
apiVersion: v1
metadata:
  name: redis
  namespace: test
spec:
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379
kubectl apply -f redis-deploy.yaml
  • 最终连接信息如下
redisAddress: redis.test:6379
redisPassword: redis@passwd

3.2. 部署redis-exporter

3.2.1. 下载并解压redis-exporter安装包
cd  ~/workspace/prometheus/
helm pull prometheus-community/prometheus-redis-exporter
tar zvcf [xxx.tgz]
3.2.2. 配置values.yaml
cd  ~/workspace/prometheus/prometheus-redis-exporter
vim values.yaml
3.2.3. 设置redis连接
  • 参考上节中mysql的连接信息,并打开密码认证(在最下方)
redisAddress: redis.test:6379
---
auth:
  # Use password authentication
  enabled: true
  # Use existing secret (ignores redisPassword)
  secret:
    name: ""
    key: ""
  # Redis password (when not stored in a secret)
  redisPassword: "redis@passwd"
3.2.4. 设置exporter的Target

此处相比原模版改动较多,请注意差异

annotations: 
  prometheus.io/path: "/metrics"
  prometheus.io/port: "9121"
  prometheus.io/scrape: "true"
labels: {}

prometheus监控mysql没数据 prometheus监控报表_redis_05

3.2.5. 部署redis-exporter
helm install prometheus-redis-exporter -n prometheus .
  • 在prometheus-server面板中查看Target

prometheus监控mysql没数据 prometheus监控报表_数据库_06

  • 查看redis-exporter采集的信息

prometheus监控mysql没数据 prometheus监控报表_redis_07

3.3. 配置Grafana-Dashboard

prometheus监控mysql没数据 prometheus监控报表_监控程序_08

3.4. 告警规则

告警规则可以参考该监控面板配置,示例如下:

3.4.1. Redis状态
redis_up == 0
3.4.2. 内存不足
redis_memory_used_bytes/redis_memory_max_bytes * 100 > 80
3.4.3. 连接过多
redis_connected_clients > 100
3.4.4. 连接不足
redis_connected_clients < 5
3.4.5. 连接被拒绝
increase(redis_rejected_connections_total[1m]) > 0

4. 监控Kafka

4.1. kafka集群部署

请参考之前的文章部署:k8s之部署kafka集群+高可用配置

  • 依旧部署至test空间,最终连接信息如下
kafkaServer: kafka.test:9092

4.2. 部署kafka-exporter

4.2.1. 下载并解压kafka-exporter安装包
cd  ~/workspace/prometheus/
helm pull prometheus-community/prometheus-kafka-exporter
tar zvcf [xxx.tgz]
4.2.2. 配置values.yaml
cd  ~/workspace/prometheus/prometheus-kafka-exporter
vim values.yaml
4.2.3. 设置kafka连接
kafkaServer:
  - kafka.test:9092
4.2.4. 设置exporter的Target
  • 在annotations下添加:
annotations:
  prometheus.io/scrape: "true"
  prometheus.io/path: "/metrics"
  prometheus.io/port: "9308"

prometheus监控mysql没数据 prometheus监控报表_k8s_09

4.2.5. 部署kafka-exporter
helm install prometheus-kafka-exporter -n prometheus .
  • 在prometheus-server面板中查看Target

prometheus监控mysql没数据 prometheus监控报表_k8s_10

  • 查看kafka-exporter采集的信息

prometheus监控mysql没数据 prometheus监控报表_数据库_11

4.2.6. 采集更多的数据
  • 如上图,exporter当前采集的数据较少,后文配置dashborad时将无法显示数据
  • 通过指定--consumer-property激活消费者配置,使的exporter采集到更多的数据
# 进图pod
kubectl exec -it -n test kafka-0 -- bash

# 创建topic
kafka-topics.sh --zookeeper zookeeper:2181 --topic test001  --create --partitions 3 --replication-factor 2

# 生产topic(出现角标后,随意几行数据)
kafka-console-producer.sh --broker-list kafka:9092 --topic test001

# 消费topic(指定--consumer-property)
kafka-console-consumer.sh --bootstrap-server kafka:9092 --from-beginning --topic test001 --consumer-property group.id=test

prometheus监控mysql没数据 prometheus监控报表_k8s_12

4.3. 配置Grafana-Dashboard

prometheus监控mysql没数据 prometheus监控报表_监控程序_13

4.4. 告警规则

告警规则可以参考该监控面板配置,示例如下:

4.4.1. kafka节点状态
kafka_brokers < 3
4.4.2. kafka消息产生数量
sum(round(delta(kafka_topic_partition_current_offset[5m])/5)) by (topic) > 100
4.4.3. kafka消息消费数量
sum(round(delta(kafka_consumergroup_current_offset[5m])/5)) by (topic) > 100
4.4.4. 消费滞后
sum(kafka_consumergroup_lag) by (consumergroup, topic)

5. 监控RabbitMQ

5.1. RabbitMQ集群部署

请参考之前的文章部署:K8S之部署RabbitMQ集群+镜像模式实现高可用

  • 依旧部署至test空间,最终连接信息如下(rabbitmq-management)
rabbitmq:
  url: http://rabbitmq.test:15672
  user: admin
  password: admin@mq

5.2. 部署rabbitmq-exporter

5.2.1. 下载并解压rabbitmq-exporter安装包
cd  ~/workspace/prometheus/
helm pull prometheus-community/prometheus-rabbitmq-exporter
tar zvcf [xxx.tgz]
5.2.2. 配置values.yaml
cd  ~/workspace/prometheus/prometheus-rabbitmq-exporter
vim values.yaml
5.2.3. 设置kafka连接
rabbitmq:
  url: http://rabbitmq.test:15672
  user: admin
  password: admin@mq
5.2.4. 设置exporter的Target
  • 打开annotations下的注释(端口要加引号):
annotations: 
  prometheus.io/scrape: "true"
  prometheus.io/path: "/metrics"
  prometheus.io/port: "9419"

prometheus监控mysql没数据 prometheus监控报表_k8s_14

5.2.5. 部署rabbitmq-exporter
helm install prometheus-rabbitmq-exporter -n prometheus .
  • 在prometheus-server面板中查看Target

prometheus监控mysql没数据 prometheus监控报表_redis_15

  • 查看rabbitmq-exporter采集的信息

prometheus监控mysql没数据 prometheus监控报表_redis_16

5.3. 配置Grafana-Dashboard

prometheus监控mysql没数据 prometheus监控报表_数据库_17

5.4. 告警规则

告警规则可以参考该监控面板配置,示例如下:

5.4.1. 节点状态
sum by (node) (rabbitmq_running) == 0
5.4.2. 内存

超过500M即提示,数值参考历史状态

sum by (node) (round(rabbitmq_node_mem_used /1024 /1024 )) > 500
5.4.3. 文件描述符

数值参考历史状态

sum by (node) (rabbitmq_fd_used) > 100
5.4.4. 网络
rabbitmq_sockets_used < 0
5.4.5. 无队列消费
rabbitmq_consumersTotal < 0
5.4.6. 可消费消息数
increase (rabbitmq_queue_messages_ready_total[1m])
5.4.7. 未确认消息数
increase (rabbitmq_queue_messages_unacknowledged_total[1m])

若本篇内容对您有所帮助,请三连点赞,关注,收藏支持下,谢谢~