promitheus作用:

   它是一个开源的专门做系统监控和系统报警的软件,加入了CNCF基金会,而上一个加入基金会的是Kubernates,支持多种exporter采集指标数据,还支持PushGateway进行数据上报,Promethus性能足够支持上万台规模的集群。

Grafana是一个跨平台的开源的度量分析和可视化工具,可以通过将采集的指标数据查询然后可视化的展示。

指标监控(Monitoring):Linux占用内存,CPU负载占用率,磁盘IO输入输出,线程数量

链路追踪(Tracing):业务相关,多系统完成业务处理

日志收集(Logging):集成ELK,方便查看日志信息

自动化监控架构 自动化监控平台_自动化监控架构

 


 对于微服务的监控架构图

自动化监控架构 自动化监控平台_自动化监控架构_02

 

 

 

 

 

Springboot部分:

pom.xml

<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
     <groupId>io.micrometer</groupId>
      <artifactId>micrometer-registry-prometheus</artifactId>
      <scope>runtime</scope>
</dependency>

application.yml

spring:
  application:
    name: springboot-prometheus

management:
  endpoint:
    prometheus:
      enabled: true
    health:
      show-details: always
  metrics:
    export:
      prometheus:
        enabled: true
  endpoints:
    web:
      exposure:
        include: "*"

自动装载:

@Bean
    MeterRegistryCustomizer<MeterRegistry> configurer(@Value("${spring.application.name}")String applicationName){
        return (registry) -> registry.config().commonTags("application",applicationName);
    }

服务器部分

mkdir /etc/prometheus

vi  prometheus.yml   

vi  rule.yml   #规则引擎配置

vi  alermanager.yml  #警告配置(发送邮件)

vi   docker-compose.yml

prometheus.yml

# 全局配置
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  # scrape_timeout is set to the global default (10s).
# 告警配置
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['#{程序所在IP}:9093']
# 加载一次规则,并根据全局“评估间隔”定期评估它们。
#rule_files:
  #- "/etc/prometheus/rules.yml"
# 控制Prometheus监视哪些资源
# 默认配置中,有一个名为prometheus的作业,它会收集Prometheus服务器公开的时间序列数据。
scrape_configs:
  # 作业名称将作为标签“job=<job_name>`添加到此配置中获取的任何数据。
  - job_name: 'springboot_prometheus'
    scrape_interval: 5s
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['#{程序所在IP}:8081']

rule.yml

groups:
- name: example
  rules:
 # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      serverity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

alermanager.yml

global:
  resolve_timeout: 5m
  smtp_smarthost: 'xxx@xxx:587'
  smtp_from: 'zhaoysz@xxx'
  smtp_auth_username: 'xxx@xxx'
  smtp_auth_password: 'xxxx'
  smtp_require_tls: true
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'test-mails'
receivers:
- name: 'test-mails'
  email_configs:
  - to: 'scottcho@qq.com'

docker-compose.yml

version: '3.7'

networks:
 dispacher-network:
  name: dispacher-network
  external: true

services:
  prometheus:
   image: prom/prometheus
   volumes:
     - /etc/prometheus/:/etc/prometheus/
     - prometheus_data:/prometheus
   command:
     - '--config.file=/etc/prometheus/prometheus.yml'
     - '--storage.tsdb.path=/prometheus'
     - '--web.console.libraries=/usr/share/prometheus/console_libraries'
     - '--web.console.templates=/usr/share/prometheus/consoles'
     - '--web.external-url=http://${程序所在IP}:9090/'
     - '--web.enable-lifecycle'
     - '--storage.tsdb.retention=15d'
   ports:
     - 9090:9090
   links:
     - alertmanager:alertmanager
   restart: always
   networks:
     - dispacher-network
  alertmanager:
   image: prom/alertmanager
   container_name: alertmanager_gpe
   hostname: alertmanager
   restart: always
   volumes:
       - /data/gpe/alertmanager/alertmanager.yml:/etc/prometheus/alertmanager.yml
   ports:
       - "9093:9093"
   networks:
       - dispacher-network
  grafana:
   image: grafana/grafana
   ports:
     - 3000:3000
   volumes:
     - /etc/grafana/:/etc/grafana/provisioning/
     - grafana_data:/var/lib/grafana
   environment:
     - GF_INSTALL_PLUGINS=camptocamp-prometheus-alertmanager-datasource
   links:
     - prometheus:prometheus
     - alertmanager:alertmanager
   restart: always

volumes:
  prometheus_data: {}
  grafana_data: {}
  alertmanager_data: {}

Promitheus访问:http://#{服务器IP}:9090/targets

Grafana访问:http://14.18.43.72:3000/    admin/admin  设置新密码

自动化监控架构 自动化监控平台_IP_03

 

自动化监控架构 自动化监控平台_spring_04

 

 

 

自动化监控架构 自动化监控平台_自动化监控架构_05

 

自动化监控架构 自动化监控平台_自动化监控架构_06