下载blackbox_exporter

同样,黑盒监控需要安装exporter,这回下载blackbox_exporter,下载地址:https://prometheus.io/download/#blackbox_exporter

编辑配置文件

vi blackbox.yml
modules:
  http_post_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET

启动blackbox_exporter

./blackbox_exporter --config.file=blackbox.yml

prometheus添加监控项

  - job_name: blackbox-exporter
    params:
      module:
      - http_post_2xx
      target:
      - www.csdn.net
    metrics_path: /probe
    static_configs:
    - targets:
      - 10.10.10.103:9115

其中:www.csdn.net这个target代表需要监控的主机,10.10.10.103:9115这个target代表blackbox_exporter安装的主机地址和端口

alertmanager添加告警项

- name: Blackbox Expoter
  rules:
  - alert: BlackboxProbeFailed
    expr: probe_success == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox probe failed (instance {{ $labels.instance }})
      description: "Probe failed\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: BlackboxConfigurationReloadFailure
    expr: blackbox_exporter_config_last_reload_successful != 1
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Blackbox configuration reload failure (instance {{ $labels.instance }})
      description: "Blackbox configuration reload failure\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: BlackboxSlowProbe
    expr: avg_over_time(probe_duration_seconds[1m]) > 2
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Blackbox slow probe (instance {{ $labels.instance }})
      description: "Blackbox probe took more than 1s to complete\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: BlackboxProbeHttpFailure
    expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox probe HTTP failure (instance {{ $labels.instance }})
      description: "HTTP status code is not 200-399\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: BlackboxSslCertificateWillExpireSoon
    expr: 3 <= round((last_over_time(probe_ssl_earliest_cert_expiry[10m]) - time()) / 86400, 0.1) < 20
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Blackbox SSL certificate will expire soon (instance {{ $labels.instance }})
      description: "SSL certificate expires in less than 20 days\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: BlackboxSslCertificateWillExpireSoon
    expr: 0 <= round((last_over_time(probe_ssl_earliest_cert_expiry[10m]) - time()) / 86400, 0.1) < 3
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox SSL certificate will expire soon (instance {{ $labels.instance }})
      description: "SSL certificate expires in less than 3 days\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  # For probe_ssl_earliest_cert_expiry to be exposed after expiration, you
  # need to enable insecure_skip_verify. Note that this will disable
  # certificate validation.
  # See https://github.com/prometheus/blackbox_exporter/blob/master/CONFIGURATION.md#tls_config
  - alert: BlackboxSslCertificateExpired
    expr: round((last_over_time(probe_ssl_earliest_cert_expiry[10m]) - time()) / 86400, 0.1) < 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox SSL certificate expired (instance {{ $labels.instance }})
      description: "SSL certificate has expired already\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: BlackboxProbeSlowHttp
    expr: avg_over_time(probe_http_duration_seconds[1m]) > 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Blackbox probe slow HTTP (instance {{ $labels.instance }})
      description: "HTTP request took more than 1s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: BlackboxProbeSlowPing
    expr: avg_over_time(probe_icmp_duration_seconds[1m]) > 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Blackbox probe slow ping (instance {{ $labels.instance }})
      description: "Blackbox ping took more than 1s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

验证是否生效

target已添加 image.png rules已添加 image.png alerts已添加 image.png 将监控主机www.csdn.net换成一个不存在的主机试下 image.png 发现rules已出现告警 image.png alertmanager也出现了告警 image.png 钉钉告警机器人也发出了告警 image.png