具体victoriametrics的功能和使用,我这里就不介绍了,官方文档上很全面的。

这里说下我的拓扑和架构。

  1. prometheus的部署使用kube-prometheus的 operator方式部署。
  2. victoriametrics的部署使用sts方式部署。
  3. prometheus的数据通过remote_write方式写到victoriametrics里面,victoriametrics的压缩比较高,可以轻松存储数月的历史数据。


为什么没有采用全套的victoriametrics的方案?

  1. 现有的prometheus+alertmanager已经部署并对接到内部告警系统,不想再进行改造。
  2. victoriametrics在我们这是作为历史数据存储用,非核心的监控数据库。
  3. victoriametrics全套的技术栈组件也很多,引入太多,人力不足。



prometheus crd的修改

kubectl get Prometheus -n monitoring k8s -oyaml  下面是我修改后的配置:

注意是加了个remoteWrite的配置项,并且我把存储换成了nfs盘。

这里的存储方案大家根据自己需求来配就行。

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  annotations:
  generation: 27
  labels:
    app: prometheus
    prometheus: k8s
  name: k8s
  namespace: monitoring
  resourceVersion: "3757019465"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/k8s
  uid: 8c7be613-1a60-11ea-a1d8-72c40774f54f
spec:
  additionalScrapeConfigs:
    key: prometheus-additional.yaml
    name: additional-scrape-configs
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  remoteWrite:
  - url: http://victoriametrics.monitoring.svc.cluster.local:8428/api/v1/write
  replicas: 2
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  storage:
    volumeClaimTemplate:
      spec:
        resources:
          requests:
            storage: 500Gi
        storageClassName: alicloud-nas-prometheus
  version: v2.25.0


victoriametrics的2个配置

victoriametrics.svc.yaml   内容如下:

apiVersion: v1
kind: Service
metadata:
  annotations:
  labels:
    app: victoriametrics
  name: victoriametrics
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 8428
    protocol: TCP
    targetPort: 8428
  selector:
    app: victoriametrics
  sessionAffinity: None
  type: ClusterIP---apiVersion: v1
kind: Service
metadata:
  annotations:
  creationTimestamp: null
  labels:
    app: victoriametrics
  name: victoriametrics-nodeport
  namespace: monitoring
spec:
  externalTrafficPolicy: Cluster
  ports:
  - name: http
    port: 8428
    protocol: TCP
    targetPort: 8428
  selector:
    app: victoriametrics
  sessionAffinity: None
  type: NodePort



victoriametrics.sts.yaml  内容如下:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
  creationTimestamp: null
  generation: 1
  labels:
    app: victoriametrics
  name: victoriametrics
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: victoriametrics
  serviceName: victoriametrics
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: victoriametrics
    spec:
      containers:
      - args:
        - --storageDataPath=/storage        - --httpListenAddr=:8428
        - --retentionPeriod=1
        image: victoriametrics/victoria-metrics
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8428
            scheme: HTTP
          initialDelaySeconds: 120
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        name: victoriametrics
        ports:
        - containerPort: 8428
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8428
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        resources:
          limits:
            cpu: "4"
            memory: 8000Mi
          requests:
            cpu: "4"
            memory: 8000Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /storage
          name: victormetrics-storage
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      creationTimestamp: null
      name: victormetrics-storage
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 300Gi
      storageClassName: alicloud-nas-prometheus
      volumeMode: Filesystem

几个apply下,然后到grafana添加一个victoriametrics的数据源,然后可以画板子了。


磁盘占用空间对比:同样时间窗口内,victoriametrics的体积只有prometheus的25%。因为我们这里的victoriametrics只是一个历史数据存储+灾备的功能,对性能上要求不高,victoriametrics的sts里面的配置给得也不是很高,各位可以根据实际情况来修改cpu mem配额。

victoriametrics在prometheus remotewrite下的使用_prometheus