一、Prometheus 介绍
Prometheus Operator 是 CoreOS 开发的基于 Prometheus 的 Kubernetes 监控方案,也可能是目前功能最全面的开源方案。 Prometheus Operator 通过 Grafana 展示监控数据,预定义了一系列的 Dashboard
1.1、Prometheus 架构
Prometheus 是一个非常优秀的监控工具。准确的说,应该是监控方案。Prometheus 提供了数据搜集、存储、处理、可视化和告警一套完整的解决方案。Prometheus 的架构如下图所示:
Prometheus Server Prometheus Server 负责从 Exporter 拉取和存储监控数据,并提供一套灵活的查询语言(PromQL)供用户使用。
Exporter Exporter 负责收集目标对象(host, container...)的性能数据,并通过 HTTP 接口供 Prometheus Server 获取。
可视化组件 监控数据的可视化展现对于监控方案至关重要。以前 Prometheus 自己开发了一套工具,不过后来废弃了,因为开源社区出现了更为优秀的产品 Grafana。Grafana 能够与 Prometheus 无缝集成,提供完美的数据展示能力。
Alertmanager 用户可以定义基于监控数据的告警规则,规则会触发告警。一旦 Alermanager 收到告警,会通过预定义的方式发出告警通知。支持的方式包括 Email、PagerDuty、Webhook 等.
1.2、Prometheus Operator 架构
Prometheus Operator 目前功能最全面的开源监控方案。 能够监控Node Port ,并支持集群的各种管理组件,如 API Server 、Scheduler、Controller Manager等。
Prometheus Operator 的目标是尽可能简化在 Kubernetes 中部署和维护 Prometheus 的工作。其架构如下图所示:
图上的每一个对象都是 Kubernetes 中运行的资源。
Operator Operator 即 Prometheus Operator,在 Kubernetes 中以 Deployment 运行。其职责是部署和管理 Prometheus Server,根据 ServiceMonitor 动态更新 Prometheus Server 的监控对象。
Prometheus Server Prometheus Server 会作为 Kubernetes 应用部署到集群中。为了更好地在 Kubernetes 中管理 Prometheus,CoreOS 的开发人员专门定义了一个命名为 Prometheus 类型的 Kubernetes 定制化资源。我们可以把 Prometheus看作是一种特殊的 Deployment,它的用途就是专门部署 Prometheus Server。
Service 这里的 Service 就是 Cluster 中的 Service 资源,也是 Prometheus 要监控的对象,在 Prometheus 中叫做 Target。每个监控对象都有一个对应的 Service。比如要监控 Kubernetes Scheduler,就得有一个与 Scheduler 对应的 Service。当然,Kubernetes 集群默认是没有这个 Service 的,Prometheus Operator 会负责创建。
ServiceMonitor Operator 能够动态更新 Prometheus 的 Target 列表,ServiceMonitor 就是 Target 的抽象。比如想监控 Kubernetes Scheduler,用户可以创建一个与 Scheduler Service 相映射的 ServiceMonitor 对象。Operator 则会发现这个新的 ServiceMonitor,并将 Scheduler 的 Target 添加到 Prometheus 的监控列表中。
ServiceMonitor 也是 Prometheus Operator 专门开发的一种 Kubernetes 定制化资源类型。
Alertmanager 除了 Prometheus 和 ServiceMonitor,Alertmanager 是 Operator 开发的第三种 Kubernetes 定制化资源。我们可以把 Alertmanager 看作是一种特殊的 Deployment,它的用途就是专门部署 Alertmanager 组件。
二、Helm 安装部署
Helm 有两个重要的概念:chart 和 release。 chart 是创建一个应用的信息集合,包括各种 Kubernetes 对象的配置模板、参数定义、依赖关系、文档说明等。chart 是应用部署的自包含逻辑单元。可以将 chart 想象成 apt、yum 中的软件安装包。 release 是 chart 的运行实例,代表了一个正在运行的应用。当 chart 被安装到 Kubernetes 集群,就生成一个 release。chart 能够多次安装到同一个集群,每次安装都是一个 release。
2.1、Helm 客户端安装
在https://github.com/helm/helm/releases
下载最新的版本。
[root@master ~]# tar xf helm-v2.12.1-linux-amd64.tar.gz
[root@master ~]# cp linux-amd64/helm /usr/local/bin/
[root@master ~]# helm version
Client: &version.Version{SemVer:"v2.12.1", GitCommit:"02a47c7249b1fc6d8fd3b94e6b4babf9d818144e", GitTreeState:"clean"}
Error: could not find tiller
目前只能查看到客户端的版本,服务器还没有安装。
2.2、Tiller 服务器安装
对于启用了 RBAC 的集群,我们首先创建授权,参照文档https://github.com/helm/helm/blob/master/docs/rbac.md
,创建rbac-config.yaml
,内容如下:
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
然后进行创建。
[root@master ~]# kubectl apply -f rbac-config.yaml
serviceaccount/tiller created
clusterrolebinding.rbac.authorization.k8s.io/tiller created
Tiller 服务器安装非常简单,只需要执行 helm init,版本尽快使用相同的,因为国内访问不了google的镜像,我们使用国内的阿里云:
[root@master ~]# helm init --service-account tiller --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.12.1 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
Creating /root/.helm/repository
Creating /root/.helm/repository/cache
Creating /root/.helm/repository/local
Creating /root/.helm/plugins
Creating /root/.helm/starters
Creating /root/.helm/cache/archive
Creating /root/.helm/repository/repositories.yaml
Adding stable repo with URL: https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /root/.helm.
Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
Happy Helming!
查看安装结果。
[root@master ~]# helm version
Client: &version.Version{SemVer:"v2.12.1", GitCommit:"02a47c7249b1fc6d8fd3b94e6b4babf9d818144e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.12.1", GitCommit:"02a47c7249b1fc6d8fd3b94e6b4babf9d818144e", GitTreeState:"clean"}
[root@master ~]# helm repo list
NAME URL
stable https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
local http://127.0.0.1:8879/charts
三、部署 Prometheus Operator
本节在实践时使用的是 Prometheus Operator 版本 v0.26.0。由于项目开发迭代速度很快,部署方法可能会更新,必要时请参考官方文档。
[root@master ~]# git clone https://github.com/coreos/prometheus-operator.git
[root@master ~]# cd prometheus-operator/
为方便管理,创建一个单独的 Namespace monitoring,Prometheus Operator 相关的组件都会部署到这个 Namespace。
[root@master prometheus-operator]# kubectl create namespace monitoring
namespace/monitoring created
3.1、安装 Prometheus Operator Deployment
首先更新一下 repo 源。
helm repo update
然后进行 helm 安装,因为要下载几百兆的镜像,速度会慢一些,也可以提醒把镜像下载好,建议使用阿里云的镜像,然后再改名,这样速度比较快。
[root@master prometheus-operator]# helm install --name prometheus-operator --set rbacEnable=true --namespace=monitoring helm/prometheus-operator
NAME: prometheus-operator
LAST DEPLOYED: Tue Dec 25 22:09:31 2018
NAMESPACE: monitoring
STATUS: DEPLOYED
RESOURCES:
==> v1beta1/PodSecurityPolicy
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
prometheus-operator false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
==> v1/ConfigMap
NAME DATA AGE
prometheus-operator 1 3s
==> v1/ServiceAccount
NAME SECRETS AGE
prometheus-operator 1 3s
==> v1beta1/ClusterRole
NAME AGE
prometheus-operator 3s
psp-prometheus-operator 3s
==> v1beta1/ClusterRoleBinding
NAME AGE
prometheus-operator 3s
psp-prometheus-operator 3s
==> v1beta1/Deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
prometheus-operator 1 1 1 1 3s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
prometheus-operator-867bbfddbd-vpsjd 1/1 Running 0 3s
NOTES:
The Prometheus Operator has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l "app=prometheus-operator,release=prometheus-operator"
Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.
查看创建的资源。
[root@master prometheus-operator]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
prometheus-operator-867bbfddbd-vpsjd 1/1 Running 0 95s
[root@master prometheus-operator]# kubectl get deploy -n monitoring
NAME READY UP-TO-DATE AVAILABLE AGE
prometheus-operator 1/1 1 1 103s
[root@master prometheus-operator]# helm list
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
prometheus-operator 1 Tue Dec 25 22:09:31 2018 DEPLOYED prometheus-operator-0.0.29 0.20.0 monitoring
3.2、安装 Prometheus
[root@master prometheus-operator]# helm install --name prometheus --set serviceMonitorsSelector.app=prometheus --set ruleSelector.app=prometheus --namespace=monitoring helm/prometheus
NAME: prometheus
LAST DEPLOYED: Tue Dec 25 22:17:06 2018
NAMESPACE: monitoring
STATUS: DEPLOYED
RESOURCES:
==> v1/Prometheus
NAME AGE
prometheus 0s
==> v1/PrometheusRule
NAME AGE
prometheus-rules 0s
==> v1/ServiceMonitor
NAME AGE
prometheus 0s
==> v1beta1/PodSecurityPolicy
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
prometheus false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
==> v1/ServiceAccount
NAME SECRETS AGE
prometheus 1 0s
==> v1beta1/ClusterRole
NAME AGE
prometheus 0s
psp-prometheus 0s
==> v1beta1/ClusterRoleBinding
NAME AGE
prometheus 0s
psp-prometheus 0s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus ClusterIP 10.109.224.217 <none> 9090/TCP 0s
NOTES:
A new Prometheus instance has been created.
DEPRECATION NOTICE:
- additionalRulesConfigMapLabels is not used anymore, use additionalRulesLabels
查看创建情况。
[root@master prometheus-operator]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
prometheus-operator-867bbfddbd-vpsjd 1/1 Running 0 16m
prometheus-prometheus-0 3/3 Running 1 9m10s
[root@master prometheus-operator]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus ClusterIP 10.111.72.32 <none> 9090/TCP 22m
prometheus-operated ClusterIP None <none> 9090/TCP 19m
3.3、安装 Alertmanager
[root@master prometheus-operator]# helm install --name alertmanager --namespace=monitoring helm/alertmanager
NAME: alertmanager
LAST DEPLOYED: Tue Dec 25 22:30:11 2018
NAMESPACE: monitoring
STATUS: DEPLOYED
RESOURCES:
==> v1/PrometheusRule
NAME AGE
alertmanager 0s
==> v1/ServiceMonitor
NAME AGE
alertmanager 0s
==> v1beta1/PodSecurityPolicy
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
alertmanager false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
==> v1/Secret
NAME TYPE DATA AGE
alertmanager-alertmanager Opaque 1 0s
==> v1beta1/ClusterRole
NAME AGE
psp-alertmanager 0s
==> v1beta1/ClusterRoleBinding
NAME AGE
psp-alertmanager 0s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager ClusterIP 10.102.68.166 <none> 9093/TCP 0s
==> v1/Alertmanager
NAME AGE
alertmanager 0s
NOTES:
A new Alertmanager instance has been created.
DEPRECATION NOTICE:
- additionalRulesConfigMapLabels is not used anymore, use additionalRulesLabels
查看安装结果。
[root@master prometheus-operator]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-alertmanager-0 2/2 Running 0 13m
prometheus-operator-867bbfddbd-vpsjd 1/1 Running 0 33m
prometheus-prometheus-0 3/3 Running 1 26m
3.4、安装 kube-prometheus
kube-prometheus 是一个 Helm Chart,打包了监控 Kubernetes 需要的所有 Exporter 和 ServiceMonitor,会创建几个 Service。 https://github.com/coreos/prometheus-operator/blob/master/helm/README.md 从上面的连接中可以看到调整了安装方式,安装过程如下:
[root@master prometheus-operator]# mkdir -p helm/kube-prometheus/charts
[root@master prometheus-operator]# helm package -d helm/kube-prometheus/charts helm/alertmanager helm/grafana helm/prometheus helm/exporter-kube-dns \
> helm/exporter-kube-scheduler helm/exporter-kubelets helm/exporter-node helm/exporter-kube-controller-manager \
> helm/exporter-kube-etcd helm/exporter-kube-state helm/exporter-coredns helm/exporter-kubernetes
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/alertmanager-0.1.7.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/grafana-0.0.37.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/prometheus-0.0.51.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/exporter-kube-dns-0.1.7.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/exporter-kube-scheduler-0.1.9.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/exporter-kubelets-0.2.11.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/exporter-node-0.4.6.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/exporter-kube-controller-manager-0.1.10.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/exporter-kube-etcd-0.1.15.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/exporter-kube-state-0.2.6.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/exporter-coredns-0.0.3.tgz
Successfully packaged chart and saved it to: helm/kube-prometheus/charts/exporter-kubernetes-0.1.10.tgz
[root@master prometheus-operator]# helm install helm/kube-prometheus --name kube-prometheus --namespace monitoring
NAME: kube-prometheus
LAST DEPLOYED: Tue Dec 25 23:02:25 2018
NAMESPACE: monitoring
STATUS: DEPLOYED
RESOURCES:
==> v1/Secret
NAME TYPE DATA AGE
alertmanager-kube-prometheus Opaque 1 1s
kube-prometheus-grafana Opaque 2 1s
==> v1beta1/RoleBinding
NAME AGE
kube-prometheus-exporter-kube-state 1s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-prometheus-alertmanager ClusterIP 10.109.99.33 <none> 9093/TCP 1s
kube-prometheus-exporter-kube-controller-manager ClusterIP None <none> 10252/TCP 1s
kube-prometheus-exporter-kube-dns ClusterIP None <none> 10054/TCP,10055/TCP 1s
kube-prometheus-exporter-kube-etcd ClusterIP None <none> 4001/TCP 1s
kube-prometheus-exporter-kube-scheduler ClusterIP None <none> 10251/TCP 1s
kube-prometheus-exporter-kube-state ClusterIP 10.106.111.57 <none> 80/TCP 1s
kube-prometheus-exporter-node ClusterIP 10.107.178.109 <none> 9100/TCP 1s
kube-prometheus-grafana ClusterIP 10.110.171.226 <none> 80/TCP 1s
kube-prometheus ClusterIP 10.102.19.97 <none> 9090/TCP 1s
==> v1beta1/Deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kube-prometheus-exporter-kube-state 1 1 1 0 1s
kube-prometheus-grafana 1 1 1 0 1s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
kube-prometheus-exporter-node-8cclq 0/1 ContainerCreating 0 1s
kube-prometheus-exporter-node-xsqvj 0/1 ContainerCreating 0 1s
kube-prometheus-exporter-node-zcjfj 0/1 ContainerCreating 0 1s
kube-prometheus-exporter-kube-state-7bb8cf75d9-czp24 0/2 ContainerCreating 0 1s
kube-prometheus-grafana-6f4bb75c95-jvfzn 0/2 ContainerCreating 0 1s
==> v1beta1/ClusterRole
NAME AGE
psp-kube-prometheus-alertmanager 1s
kube-prometheus-exporter-kube-state 1s
psp-kube-prometheus-exporter-kube-state 1s
psp-kube-prometheus-exporter-node 1s
psp-kube-prometheus-grafana 1s
kube-prometheus 1s
psp-kube-prometheus 1s
==> v1beta1/DaemonSet
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-prometheus-exporter-node 3 3 0 3 0 <none> 1s
==> v1beta1/PodSecurityPolicy
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
kube-prometheus-alertmanager false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
kube-prometheus-exporter-kube-state false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
kube-prometheus-exporter-node false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim,hostPath
kube-prometheus-grafana false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim,hostPath
kube-prometheus false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
==> v1/ServiceAccount
NAME SECRETS AGE
kube-prometheus-exporter-kube-state 1 1s
kube-prometheus-exporter-node 1 1s
kube-prometheus-grafana 1 1s
kube-prometheus 1 1s
==> v1beta1/ClusterRoleBinding
NAME AGE
psp-kube-prometheus-alertmanager 1s
kube-prometheus-exporter-kube-state 1s
psp-kube-prometheus-exporter-kube-state 1s
psp-kube-prometheus-exporter-node 1s
psp-kube-prometheus-grafana 1s
kube-prometheus 1s
psp-kube-prometheus 1s
==> v1beta1/Role
NAME AGE
kube-prometheus-exporter-kube-state 1s
==> v1/Prometheus
NAME AGE
kube-prometheus 1s
==> v1/ServiceMonitor
NAME AGE
kube-prometheus-alertmanager 0s
kube-prometheus-exporter-kube-controller-manager 0s
kube-prometheus-exporter-kube-dns 0s
kube-prometheus-exporter-kube-etcd 0s
kube-prometheus-exporter-kube-scheduler 0s
kube-prometheus-exporter-kube-state 0s
kube-prometheus-exporter-kubelets 0s
kube-prometheus-exporter-kubernetes 0s
kube-prometheus-exporter-node 0s
kube-prometheus-grafana 0s
kube-prometheus 0s
==> v1/ConfigMap
NAME DATA AGE
kube-prometheus-grafana 10 1s
==> v1/Alertmanager
NAME AGE
kube-prometheus 1s
==> v1/PrometheusRule
NAME AGE
kube-prometheus-alertmanager 1s
kube-prometheus-exporter-kube-controller-manager 1s
kube-prometheus-exporter-kube-etcd 1s
kube-prometheus-exporter-kube-scheduler 1s
kube-prometheus-exporter-kube-state 1s
kube-prometheus-exporter-kubelets 1s
kube-prometheus-exporter-kubernetes 1s
kube-prometheus-exporter-node 1s
kube-prometheus-rules 1s
kube-prometheus 0s
NOTES:
DEPRECATION NOTICE:
- alertmanager.ingress.fqdn is not used anymore, use alertmanager.ingress.hosts []
- prometheus.ingress.fqdn is not used anymore, use prometheus.ingress.hosts []
- grafana.ingress.fqdn is not used anymore, use prometheus.grafana.hosts []
- additionalRulesConfigMapLabels is not used anymore, use additionalRulesLabels
- prometheus.additionalRulesConfigMapLabels is not used anymore, use additionalRulesLabels
- alertmanager.additionalRulesConfigMapLabels is not used anymore, use additionalRulesLabels
- exporter-kube-controller-manager.additionalRulesConfigMapLabels is not used anymore, use additionalRulesLabels
- exporter-kube-etcd.additionalRulesConfigMapLabels is not used anymore, use additionalRulesLabels
- exporter-kube-scheduler.additionalRulesConfigMapLabels is not used anymore, use additionalRulesLabels
- exporter-kubelets.additionalRulesConfigMapLabels is not used anymore, use additionalRulesLabels
- exporter-kubernetes.additionalRulesConfigMapLabels is not used anymore, use additionalRulesLabels
等需要的镜像下载完成,因为有的镜像国内访问不到,建议都从阿里云间接下载,我们查看安装结果。 每个 Exporter 会对应一个 Service,为 Pormetheus 提供 Kubernetes 集群的各类监控数据。
[root@master prometheus-operator]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager ClusterIP 10.102.68.166 <none> 9093/TCP 52m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 52m
kube-prometheus ClusterIP 10.109.193.133 <none> 9090/TCP 3m42s
kube-prometheus-alertmanager ClusterIP 10.110.67.174 <none> 9093/TCP 3m42s
kube-prometheus-exporter-kube-state ClusterIP 10.101.225.77 <none> 80/TCP 3m42s
kube-prometheus-exporter-node ClusterIP 10.103.162.196 <none> 9100/TCP 3m42s
kube-prometheus-grafana ClusterIP 10.101.81.167 <none> 80/TCP 3m42s
prometheus ClusterIP 10.109.224.217 <none> 9090/TCP 65m
prometheus-operated ClusterIP None <none> 9090/TCP 65m
每个 Service 对应一个 ServiceMonitor,组成 Pormetheus 的 Target 列表。
[root@master prometheus-operator]# kubectl get servicemonitor -n monitoring
NAME AGE
alertmanager 14m
kube-prometheus 22m
kube-prometheus-alertmanager 22m
kube-prometheus-exporter-kube-controller-manager 22m
kube-prometheus-exporter-kube-dns 22m
kube-prometheus-exporter-kube-etcd 22m
kube-prometheus-exporter-kube-scheduler 22m
kube-prometheus-exporter-kube-state 22m
kube-prometheus-exporter-kubelets 22m
kube-prometheus-exporter-kubernetes 22m
kube-prometheus-exporter-node 22m
kube-prometheus-grafana 22m
prometheus 16m
prometheus-operator 2h
如下是与 Prometheus Operator 相关的所有 Pod,我们注意到有些 Exporter 没有运行 Pod,这是因为像 API Server、Scheduler、Kubelet 等 Kubernetes 内部组件原生就支持 Prometheus,只需要定义 Service 就能直接从预定义端口获取监控数据。
[root@master prometheus-operator]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-alertmanager-0 2/2 Running 0 15m
alertmanager-kube-prometheus-0 2/2 Running 0 15m
kube-prometheus-exporter-kube-state-dc6966bb5-r5kng 2/2 Running 0 25m
kube-prometheus-exporter-node-grtvz 1/1 Running 0 25m
kube-prometheus-exporter-node-jfq79 1/1 Running 0 25m
kube-prometheus-exporter-node-n79vq 1/1 Running 0 25m
kube-prometheus-grafana-6f4bb75c95-bw72r 2/2 Running 0 25m
prometheus-kube-prometheus-0 3/3 Running 1 15m
prometheus-operator-867bbfddbd-rxj6s 1/1 Running 0 15m
prometheus-prometheus-0 3/3 Running 1 15m
为了方便访问 kube-prometheus-grafana,我们将 Service 类型改为 NodePort。
[root@master prometheus-operator]# kubectl patch svc kube-prometheus-grafana -p '{"spec":{"type":"NodePort"}}' -n monitoring
service/alertmanager patched
[root@master prometheus-operator]# kubectl patch svc kube-prometheus -p '{"spec":{"type":"NodePort"}}' -n monitoring
service/kube-prometheus patched
[root@master prometheus-operator]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager ClusterIP 10.110.11.161 <none> 9093/TCP 21m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 19m
kube-prometheus NodePort 10.102.23.109 <none> 9090:31679/TCP 29m
kube-prometheus-alertmanager NodePort 10.97.58.177 <none> 9093:31627/TCP 29m
kube-prometheus-exporter-kube-state ClusterIP 10.110.185.195 <none> 80/TCP 29m
kube-prometheus-exporter-node ClusterIP 10.111.98.237 <none> 9100/TCP 29m
kube-prometheus-grafana NodePort 10.105.188.204 <none> 80:30357/TCP 29m
prometheus ClusterIP 10.111.72.32 <none> 9090/TCP 22m
prometheus-operated ClusterIP None <none> 9090/TCP 19m
四、查看效果图
4.1、查看 kube-prometheus
访问MASTER_IP:31697
,如下所示:
4.2、查看 kube-prometheus-alertmanager
访问MASTER_IP:31627
,如下所示:
4.3、查看 kube-prometheus-grafana
访问MASTER_IP:30357/login
,然后登陆,账号密码都是 admin。
可以监控 Kubernetes 集群的整体健康状态:
整个集群的资源使用情况:
Kubernetes 各个管理组件的状态:
节点的资源使用情况:
Deployment 的运行状态:
Pod 的运行状态:
StatefulSet 运行状态:
这些 Dashboard 展示了从集群到 Pod 的运行状况,能够帮助用户更好地运维 Kubernetes。而且 Prometheus Operator 迭代非常快,相信会继续开发出更多更好的功能,所以值得我们花些时间学习和实践。
官方文档:https://github.com/coreos/prometheus-operator/tree/master/helm