通过prometheus-operator可以简化k8s监控布署,但是也存在一些问题,prometheus通过crd固化了一些配置,使初学者使用起来有些困难,比如一些配置文件参数的修改不能通过config-map配置,只能改crd文件,而crd是固化在配置里面,跟本没法改。
现阶段配置修改可以通过helm修改,支持的可配置参数最全面,但是也有坑,通过value.yaml可以灵活配置prometheus的各种安装参数,安装过程如下:
注意在国内下载helm各种源的时候经常会卡住,先将dns设为阿里dns到国外会比较快

vim /etc/resolv.conf

nameserver 223.5.5.5
nameserver 223.6.6.6

安装helm

wget https://storage.googleapis.com/kubernetes-helm/helm-v2.14.0-linux-amd64.tar.gz
解压缩 tar -zxvf helm-v2.10.0-linux-amd64.tar.gz

mv linux-amd64/helm /usr/local/bin/helm

helm version

切换到啊里云

所有节点安装: yum install socat
rm -rf /root/.helm/*
helm init --client-only --stable-repo-url https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts/

helm init --client-only --stable-repo-url https://kubernetes-charts-incubator.storage.googleapis.com/charts/
helm repo add incubator https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/
helm repo update


安装tiller服务端 (不能下载可以 使用国内镜像)
1、docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.10.0

2、 权限配置不然会报这种错 no release found
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}' (此项高版本或许不起效)

3、 helm init -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.10.0 
或
helm init --service-account tiller --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.10.0 --skip-refresh

4.增加每个节点端口固定配置

更改svc为 nodeport 
  ports:
  - nodePort: 44134
    port: 44134
    protocol: TCP
  type: NodePort


指定tiller的nodeport地址
export HELM_HOST=172.20.0.101:44134


helm init常用配置项如下:
--canary-image:安装金丝雀build
--tiller-image:安装指定image
--kube-context:安装到指定的kubernetes集群
--tiller-namespace:安装到指定的namespace中
--upgrade:如果tiller server已经被安装了,可以使用此选项更新镜像
--service-account:用于指定运行tiller server的serviceaccount,该account需要事先在kubernetes集群中创建,且需要相应的rbac授权

 helm install . --dry-run --debug #显示yaml文件

#helm tiller自定义修改,输入yaml文件

 helm init --output yaml

下载charts文件,找到版本号

wget https://github.com/helm/charts/archive/master.zip
unzip master.zip 
cd stable/prometheus-operator

当执行下面helm安装时会报错,找不到grafana kube-state-metrics prometheus-node-exporter
[root@k8s-test-master-1 prometheus-operator]# helm install . -n test
Error: found in requirements.yaml, but missing in charts/ directory: kube-state-metrics, prometheus-node-exporter, grafana

首先编辑依赖文件,找到依赖版本,
vim  requirements.yaml 找到grafna kube-state-metrics  prometheus-node-exporter 查看版本

dependencies:

  - name: kube-state-metrics
    version: 2.0.*
    repository: https://kubernetes-charts.storage.googleapis.com/
    condition: kubeStateMetrics.enabled

  - name: prometheus-node-exporter
    version: 1.5.*
    repository: https://kubernetes-charts.storage.googleapis.com/
    condition: nodeExporter.enabled

  - name: grafana
    version: 3.7.*
    repository: https://kubernetes-charts.storage.googleapis.com/
    condition: grafana.enabled

此时看到每个应用的version 即charts.tgz的包名

下载helm-charts的索引文件,找到tgz下载charts

wget  https://kubernetes-charts.storage.googleapis.com/index.yaml

[root@k8s-test-master-1 tmp]# grep grafana-3.7*.tgz index.yaml.1 
    - https://kubernetes-charts.storage.googleapis.com/grafana-3.7.2.tgz

找到 grafana  3.7.* 的taz其它包一样

下载tgz包
 9142  wget  https://kubernetes-charts.storage.googleapis.com/grafana-3.7.2.tgz
 9143  wget https://kubernetes-charts.storage.googleapis.com/prometheus-node-exporter-1.5.2.tgz
 9144  wget https://kubernetes-charts.storage.googleapis.com/kube-state-metrics-2.0.0.tgz

tgz里面的charts可以改为本地的镜像,或改里面的参数

现阶段harbor已支持charts仓库,直接将tgz包上传到harbor仓库
harbor在安装进注意加载charts支持 --with-chartmuseum

./install.sh --with-chartmuseum

创建一个helm的仓库

通过web界面上传tgz包

prometheus为什么选择victoriametrics_linux

将harbor仓库加入到helm repo

helm repo add --username=admin --password=Admin_dsfdsd  harbor-repo   http://harbor-test.ai.com/chartrepo/helm	

检查仓库加入是否完成
[root@k8s-test-master-1 tmp]# helm repo list
NAME       	URL                                             
stable     	https://kubernetes-charts.storage.googleapis.com
local      	http://127.0.0.1:8879/charts                    
apphub     	https://apphub.aliyuncs.com                     
harbor-repo	http://harbor-test.ai.com/chartrepo/helm

将 prometheus-operator 下面 requirements.yaml 里面改为本地仓库地址

dependencies:

  - name: kube-state-metrics
    version: 2.0.*
    #repository: https://kubernetes-charts.storage.googleapis.com/
    repository: http://harbor-test.ai.com/chartrepo/helm/
    condition: kubeStateMetrics.enabled

  - name: prometheus-node-exporter
    version: 1.5.*
    repository: http://harbor-test.ai.com/chartrepo/helm/
    condition: nodeExporter.enabled

  - name: grafana
    version: 3.7.*
    repository: http://harbor-test.ai.com/chartrepo/helm/
    condition: grafana.enabled
~

自定义value.yaml 参数化布署prometheus

vim value.yaml

#自定义prometheus 镜像地址
    image:
      #repository: quay.io/prometheus/prometheus
      repository: harbor-test.ai.com/public/prometheus
      tag: v2.10.0

#自定义远程存储

    #remoteWrite: []
    remoteWrite:
      - url: http://172.20.0.104:9201/write

布署

kubectl create  namespace monitoring
helm install --name pro . --namespace monitoring    -f values.yaml

查看
[root@k8s-test-master-1 prometheus-operator]# helm list --all
NAME	REVISION	UPDATED                 	STATUS  	CHART                    	APP VERSION	NAMESPACE 
pro 	1       	Mon Aug  5 14:19:36 2019	DEPLOYED	prometheus-operator-6.4.0	0.31.1     	monitoring

查看布署参数是否生效,通过dashboard登陆容器可以看到配置的远程存储生效了

prometheus为什么选择victoriametrics_linux_02

访问grafana

[root@k8s-test-master-1 prometheus-operator]# kubectl get pod -n monitoring  -o wide |grep grafana
pro-grafana-548cbf4699-5w6bh                          2/2     Running   8          6h6m    10.12.38.9

打开 http://10.12.38.9:3000

prometheus为什么选择victoriametrics_linux_03

prometheus为什么选择victoriametrics_linux_04


注意, pods面板的network 显示不了,更改1m 为5m显示

sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{job="kubelet", cluster="$cluster", namespace="$namespace", pod_name="$pod"}[1m])))


sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{job="kubelet", cluster="$cluster", namespace="$namespace", pod_name="$pod"}[5m])))

sort_desc(sum by (pod_name) (rate(container_network_transmit_bytes_total{job="kubelet", cluster="$cluster", namespace="$namespace", pod_name="$pod"}[1m])))

sort_desc(sum by (pod_name) (rate(container_network_transmit_bytes_total{job="kubelet", cluster="$cluster", namespace="$namespace", pod_name="$pod"}[5m])))

prometheus为什么选择victoriametrics_linux_05

helm布署会由crd监控,会不断重新创建,要清理helm布署的prometheus,执行以下操作

[root@k8s-test-master-1 tmp]# helm list 
NAME	REVISION	UPDATED                 	STATUS  	CHART                    	APP VERSION	NAMESPACE 
pro 	1       	Mon Aug  5 14:19:36 2019	DEPLOYED	prometheus-operator-6.4.0	0.31.1     	monitoring

pro是指布署的release版本
删除helm布署
helm delete pro --purge

删除 crd
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com