VictoriaMetrics vmalert 使用
以下是关于vmalert 的使用,主要是测试下各个组件的集成
环境准备
注意环境集成了vmauth,vmagent 等好多VictoriaMetrics的组件,基本上就是一个比较完备的prometheus集成环境了
- docker-compose 文件
说明目前vmalert 通过vmauth 会有错误异常,应该属于编码问题
version: "3"
services:
vmstorage:
image: victoriametrics/vmstorage
ports:
- 8482:8482
- 8400:8482
- 8401:8482
volumes:
- ./strgdata:/storage
command:
- '--storageDataPath=/storage'
vmagent:
image: victoriametrics/vmagent
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- 8429:8429
command:
- -promscrape.config=/etc/prometheus/prometheus.yml
- -remoteWrite.basicAuth.username=dalong-insert-account-1
- -remoteWrite.basicAuth.password=dalong
- -remoteWrite.url=http://vmauth:8427
alertmanager:
image: prom/alertmanager:latest
volumes:
- "./alertmanager.yaml:/etc/alertmanager.yaml"
command:
- --config.file=/etc/alertmanager.yaml
- --storage.path=/tmp/alertmanager1
ports:
- 9093:9093
vmalert:
image: victoriametrics/vmalert
volumes:
- "./alert.rules:/etc/victoriametrics/alert.rules"
ports:
- 8880:8880
command:
- -rule=/etc/victoriametrics/alert.rules
- -datasource.url=http://vmselect:8481/select/1/prometheus
# - -datasource.url=http://vmauth:8427
# - -datasource.basicAuth.password=dalong
# - -datasource.basicAuth.username=dalong-select-account-1
- -notifier.url=http://alertmanager:9093
vmauth:
image: victoriametrics/vmauth
volumes:
- "./config.yaml:/etc/victoriametrics/config.yaml"
command:
- '-auth.config=/etc/victoriametrics/config.yaml'
ports:
- 8427:8427
vminsert:
image: victoriametrics/vminsert
command:
- '--storageNode=vmstorage:8400'
ports:
- 8480:8480
vmselect:
image: victoriametrics/vmselect
command:
- '--storageNode=vmstorage:8401'
ports:
- 8481:8481
grafana:
image: grafana/grafana
ports:
- 3000:3000
- 配置说明
vmauth 配置:
users:
- username: "dalong-select-account-1"
password: "dalong"
url_prefix: "http://vmselect:8481/select/1/prometheus"
- username: "dalong-insert-account-1"
password: "dalong"
url_prefix: "http://vminsert:8480/insert/1/prometheus"
vmagent 配置(就是prometheus 的配置)
global:
scrape_interval: 1s
evaluation_interval: 1s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'vminsert'
static_configs:
- targets: ['vminsert:8480']
- job_name: 'vmselect'
static_configs:
- targets: ['vmselect:8481']
- job_name: 'vmstorage'
static_configs:
- targets: ['vmstorage:8482']
vmalert 配置 (alert.rules 文件,主要测试)
groups:
- name: groupGorSingleAlert
rules:
- alert: VMRows
for: 10s
expr: vm_rows > 0
labels:
label: bar
host: "{{ $labels.instance }}"
annotations:
summary: "{{ $value|humanize }}"
description: "{{$labels}}"
- name: TestGroup
rules:
- alert: Conns
expr: sum(vm_tcplistener_conns) by(instance) > 1
annotations:
summary: "Too high connection number for {{$labels.instance}}"
description: "It is {{ $value }} connections for {{$labels.instance}}"
- alert: ExampleAlertAlwaysFiring
expr: sum by(job)
(up == 1)
alertmanager 配置
global:
resolve_timeout: 30s
route:
group_by: ["alertname"]
group_wait: 5s
group_interval: 10s
repeat_interval: 999h
receiver: "default"
routes:
- receiver: "default"
group_by: []
match_re:
alertname: .*
continue: true
- receiver: "pagination"
group_by: ["alertname", "instance"]
match_re:
alertname: Pagination Test
continue: false
- receiver: "by-cluster-service"
group_by: ["alertname", "cluster", "service"]
match_re:
alertname: .*
continue: true
- receiver: "by-name"
group_by: [alertname]
match_re:
alertname: .*
continue: true
- receiver: "by-cluster"
group_by: [cluster]
match_re:
alertname: .*
continue: true
inhibit_rules:
- source_match:
severity: "critical"
target_match:
severity: "warning"
# Apply inhibition if the alertname and cluster is the same in both
equal: ["alertname", "cluster"]
receivers:
- name: "default"
- name: "pagination"
- name: "by-cluster-service"
- name: "by-name"
- name: "by-cluster"
- 支持的命令
vmalert-20200521-152717-tags-v1.35.6-cluster-0-gdcbdc009f
Usage of /vmalert-prod:
-datasource.basicAuth.password string
Optional basic auth password for -datasource.url
-datasource.basicAuth.username string
Optional basic auth username for -datasource.url
-datasource.url string
Victoria Metrics or VMSelect url. Required parameter. E.g. http://127.0.0.1:8428
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-evaluationInterval duration
How often to evaluate the rules. Default 1m (default 1m0s)
-external.url string
External URL is used as alert's source for sent alerts to the notifier
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
-http.maxGracefulShutdownDuration duration
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpListenAddr string
Address to listen for http connections (default ":8880")
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-notifier.url string
Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093
-remoteRead.basicAuth.password string
Optional basic auth password for -remoteRead.url
-remoteRead.basicAuth.username string
Optional basic auth username for -remoteRead.url
-remoteRead.lookback duration
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
-remoteRead.url vmalert
Optional URL to Victoria Metrics or VMSelect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
-remoteWrite.basicAuth.password string
Optional basic auth password for -remoteWrite.url
-remoteWrite.basicAuth.username string
Optional basic auth username for -remoteWrite.url
-remoteWrite.maxQueueSize int
Defines the max number of pending datapoints to remote write endpoint (default 10000)
-remoteWrite.url string
Optional URL to Victoria Metrics or VMInsert where to persist alerts state in form of timeseries. E.g. http://127.0.0.1:8428
-rule value
Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
Examples:
-rule /path/to/file. Path to a single file with alerting rules
-rule dir/*.yaml -rule /*.yaml. Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
-rule.validateTemplates
Indicates to validate annotation and label templates (default true)
-version
Show VictoriaMetrics version
- 启动
docker-compose up -d
集成效果
说明
集成vmauth 的错误信息(属于编码问题)
error VictoriaMetrics/app/vmalert/group.go:148 failed to execute rule "TestGroup"."ExampleAlertAlwaysFiring": failed to execute query "sum by(job) (up == 1)": error parsing metrics for http://vmauth:8427/api/v1/query?query=sum+by%28job%29+%28up+%3D%3D+1%29:invalid character '\x1f' looking for beginning of value
参考资料
https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/vmalert
https://github.com/prometheus/alertmanager