目录

一、Kubernetes 常用监控方案

1、kubernetes 常用监控

2、Heapster

3、Weave Scope

4、Prometheus

二、Prometheus 概述

1、 Prometheus 架构

2、Prometheus 主要模块包括

3、Prometheus 中的 Metrics

1. 宿主机的监控数据:

2. K8s组件的数据:

3. K8s 核心监控数据(core metrics):

 4、Prometheus 工作流程

5、安装Prometheus

1. 相关资料

2. clong 项目

3. 切换分支版本

 4. 安装prometheus

5.由于国内网络限制,需要额外操作(可将镜像拉到本地,改成描述中对应的镜像名)

6. 修改grafana改成NodePort,以便浏览器访问

7. 修改prometheus改成NodePort,以便浏览器访问

8. 确定服务端口号

9. 确定pod所在的Node

注意:若所有pod都已正常,但是确无法使用web访问,记得查看自己是否开启转发功能   特别强调所有Node都要配置 

6、配置prometheus

1. 打开浏览器

2.  查看针对Alertmanager所对应状态是否为up

3. 配置Grafana

编辑


一、Kubernetes 常用监控方案

1、kubernetes 常用监控


2、Heapster

1. K8s 原生的集群监控方案(默认支持,安装就可使用)


2. K8s 有个出名的监控 agent(cAdivisor)

CPU、内存、网络流量等


prometheus监控java接口 prometheus监控pod_运维

 3. 从K8s Master 获取集群中所有的Node的信息


4. 通过这些Node 上的 kubelet 获取有用数据,而kubelet本身的数据则是从cAdvisor得到


5. 获取到的数据都被推到 Heapster 配置的后端存储中,同时支持数据可视化

prometheus监控java接口 prometheus监控pod_云原生_02

3、Weave Scope

用于监控,可视化

        ① 拓扑映射

        ② 实时应用和容器指标

        ③ 在线排障并管理容器

        ④ 强大的搜索功能


4、Prometheus

1. Heapster 与 Weave scope 侧重于监控 Node 和 Pod,无法满足集群监控本身的运行状况,如API Server、Scheduler、Controller Manager 等管理组件是否工作正常,负荷如何


2. 由SoundCloud 开源监控告警解决方案

prometheus监控java接口 prometheus监控pod_prometheus监控java接口_03

二、Prometheus 概述

1、 Prometheus 架构

prometheus监控java接口 prometheus监控pod_prometheus监控java接口_04



2、Prometheus 主要模块包括

Prometheus Server收集和存储时间序列数据(TSDB)Exporter负责收集目标对象(host ,container...)的性能数据可视化组件Grafana 能够与 Prometheus 无缝集成,提供完美的数据展示能力Alertmanager用户可以定义基于监控数据的告警规则,规则会触发告警Push Gateway主要用于短期的jobs


3、Prometheus 中的 Metrics

1. 宿主机的监控数据:

2. K8s组件的数据:

包括了各个组件的核心监控指标

3. K8s 核心监控数据(core metrics):

包括Pod、Node、容器、Service等主要 K8s 核心概念的 Metrics

 4、Prometheus 工作流程

1. 定期从配置好的 jobs 或者 exporters 中拉取 metrics,

    或者接收来自 Pushgateway 发过来的metrics


2. Prometheus server 在本地存储收集到的 metrics,并运行已定义好的 alert.rules,

    记录新的时间序号列或者向 Alertmanager 推送警报


3. Alertmanager 根据配置文件,对接收到的警报进行处理,发出告警


4. 在图形界面中,可视化采集数据

5、安装Prometheus

1. 相关资料

 安装、部署和相关说明,如下链接:

使用手册:快速入门 - 普罗米修斯操作员 (prometheus-operator.dev)


这个项目是专门管理在k8s当中,部署prometheus和Grafana,地址如下: prometheus-operator · GitHub

2. clong 项目

$ git clone https://github.com/prometheus-operator/kube-prometheus.git #将项目下载到本地
Cloning into 'kube-prometheus'...
remote: Enumerating objects: 17107, done.
remote: Counting objects: 100% (230/230), done.
remote: Compressing objects: 100% (79/79), done.
remote: Total 17107 (delta 173), reused 188 (delta 143), pack-reused 16877
Receiving objects: 100% (17107/17107), 8.80 MiB | 2.13 MiB/s, done.
Resolving deltas: 100% (11170/11170), done.

3. 切换分支版本

~$ kubectl version #查询K8s版本,因为是最新可以直接使用main分支;根据自己情况而定 WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version. Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:26:19Z", GoVersion:"go1.18.2", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:38:19Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}~$ cd kube-prometheus/ ~/kube-prometheus$ git checkout release-0.11 #切换到0.11版本 Branch 'release-0.11' set up to track remote branch 'release-0.11' from 'origin'. Switched to a new branch 'release-0.11'

k8s所对应的分支版本参考资料:

prometheus监控java接口 prometheus监控pod_prometheus监控java接口_05

GitHub - prometheus-operator/kube-prometheus: Use Prometheus to monitor Kubernetes and applications running on Kubernetes

 4. 安装prometheus

~/kube-prometheus$ kubectl apply -f manifests/setup
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
namespace/monitoring created
The CustomResourceDefinition "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
'created代表正常;有一个yaml文件不合法,当前数据太长了'

~/kube-prometheus$ grep -Rw "prometheuses.monitoring.coreos.com" manifests/setup/    #检索提示中的数据对应文件
manifests/setup/0prometheusCustomResourceDefinition.yaml:  name: prometheuses.monitoring.coreos.com

~/kube-prometheus$ kubectl create -f manifests/setup/0prometheusCustomResourceDefinition.yaml     #重新创建一下
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created

~/kube-prometheus$ kubectl apply -f manifests/    #将manifests全部应用一下

$ k get namespaces     #部署后会创建新的名称空间
NAME                   STATUS   AGE
monitoring             Active   32m
$ kubectl get pods -n monitoring     #查看名称空间中的pod
NAME                                   READY   STATUS             RESTARTS   AGE
alertmanager-main-0                    2/2     Running            0          10m
alertmanager-main-1                    2/2     Running            0          10m
alertmanager-main-2                    2/2     Running            0          10m
blackbox-exporter-559db48fd-9w6cd      3/3     Running            0          15m
grafana-546559f668-srvh4               1/1     Running            0          15m
kube-state-metrics-576b75c6f7-wxt99    2/3     ImagePullBackOff   0          15m
node-exporter-456mp                    2/2     Running            0          15m
node-exporter-7ww97                    2/2     Running            0          15m
node-exporter-jq2pt                    2/2     Running            0          15m
prometheus-adapter-5f68766c85-6bkmx    0/1     ImagePullBackOff   0          15m
prometheus-adapter-5f68766c85-xtrxh    0/1     ImagePullBackOff   0          15m
prometheus-k8s-0                       2/2     Running            0          10m
prometheus-k8s-1                       2/2     Running            0          10m
prometheus-operator-79c5847fd8-75hbk   2/2     Running            0          15m

5.由于国内网络限制,需要额外操作(可将镜像拉到本地,改成描述中对应的镜像名)

(使用bitnimi或者阿里等自己搭建)

$ kubectl get pods -n monitoring  | grep -v Running    #会发现有几个镜像拉取失败
NAME                                   READY   STATUS             RESTARTS   AGE
kube-state-metrics-576b75c6f7-wxt99    2/3     ImagePullBackOff   0          15m
prometheus-adapter-5f68766c85-6bkmx    0/1     ImagePullBackOff   0          15m
prometheus-adapter-5f68766c85-xtrxh    0/1     ImagePullBackOff   0          15m
① 解决 kube-state-metrics
'查询对应镜像 版本'
~/kube-prometheus$ kubectl -n monitoring describe pod kube-state-metrics-576b75c6f7-wxt99
Events:
  Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Normal   BackOff  58m (x494 over 179m)  kubelet  Back-off pulling image "k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.5.0"

'下载镜像到3个node'
~/kube-prometheus$ for i in k8s-master k8s-worker1 k8s-worker2; do ssh $i "
    sudo docker pull registry.cn-hangzhou.aliyuncs.com/my-name1/kube-state-metrics:2.5.0
    sudo docker tag registry.cn-hangzhou.aliyuncs.com/my-name1/kube-state-metrics:2.5.0 k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.5.0
"; done② 解决 prometheus-adapter:v0.9.1
Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Warning  Failed   35m (x9 over 144m)    kubelet  Failed to pull image "k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1": rpc error: code = Unknown desc = failed to pull and unpack image "k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1": failed to resolve reference "k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1": failed to do request: Head "https://k8s.gcr.io/v2/prometheus-adapter/prometheus-adapter/manifests/v0.9.1": dial tcp 142.250.157.82:443: connect: connection refused

'下载镜像到本地并改名'
~/kube-prometheus$ for i in k8s-master k8s-worker1 k8s-worker2; do ssh $i "
     sudo docker pull registry.cn-hangzhou.aliyuncs.com/my-name1/prometheus-adapter:v0.9.1
    sudo docker tag registry.cn-hangzhou.aliyuncs.com/my-name1/prometheus-adapter:v0.9.1 k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1
> "; done③ 解决blackbox-exporter:v0.19.0
$ kubectl -n monitoring describe pod blackbox-exporter-6b79c4588b-m9255
  Normal   Pulling    8m51s (x2 over 12m)    kubelet            Pulling image "quay.io/prometheus/blackbox-exporter:v0.19.0"

'下载镜像到本地,并改名'
$ for i in k8s-master k8s-worker1 k8s-worker2; do ssh $i "
    sudo docker pull registry.cn-hangzhou.aliyuncs.com/k-cka/blackbox-exporter:v0.19.0
    sudo docker tag registry.cn-hangzhou.aliyuncs.com/k-cka/blackbox-exporter:v0.19.0 quay.io/prometheus/blackbox-exporter:v0.19.0
"; done

若通过修改标记没有解决的话(可能是各版本镜像策略问题导致),可以通过编辑 yaml文件方式修改image地址

'再次查询pods状态都已正常运行'
~/kube-prometheus$ kubectl -n monitoring get pods
NAME                                   READY   STATUS    RESTARTS       AGE
alertmanager-main-0                    2/2     Running   0              5h6m
alertmanager-main-1                    2/2     Running   0              5h6m
alertmanager-main-2                    2/2     Running   0              5h6m
blackbox-exporter-559db48fd-9w6cd      3/3     Running   0              5h10m
grafana-546559f668-srvh4               1/1     Running   0              5h10m
kube-state-metrics-576b75c6f7-wxt99    3/3     Running   0              5h10m
node-exporter-456mp                    2/2     Running   2 (102m ago)   5h10m
node-exporter-7ww97                    2/2     Running   0              5h10m
node-exporter-jq2pt                    2/2     Running   0              5h10m
prometheus-adapter-5f68766c85-6bkmx    1/1     Running   0              5h10m
prometheus-adapter-5f68766c85-xtrxh    1/1     Running   0              5h10m
prometheus-k8s-0                       2/2     Running   0              5h6m
prometheus-k8s-1                       2/2     Running   0              5h6m
prometheus-operator-79c5847fd8-75hbk   2/2     Running   0              5h10m

6. 修改grafana改成NodePort,以便浏览器访问

$ kubectl -n monitoring patch svc grafana -p '{"spec":{"type":"NodePort"}}'

7. 修改prometheus改成NodePort,以便浏览器访问

$ kubectl -n monitoring patch svc prometheus-k8s -p '{"spec":{"type":"NodePort"}}'

8. 确定服务端口号

$ kubectl -n monitoring get svc | egrep 'grafana|prometheus-k8s'
grafana          NodePort    10.97.119.8   <none>  3000:30795/TCP                  5d2h
prometheus-k8s   NodePort    10.97.64.250  <none>  9090:31495/TCP,8080:32315/TCP   5d2h

9. 确定pod所在的Node

$ kubectl -n monitoring get pod -owide | egrep 'grafana|prometheus-k8s'
grafana-546559f668-pdv4q   1/1  Running  1 (90m ago)  98m  172.16.194.99   k8s-worker1   <none>  <none>
prometheus-k8s-0           2/2  Running  4 (90m ago)  25h  172.16.194.103  k8s-worker1   <none>           <none>
prometheus-k8s-1           2/2  Running  4 (90m ago)  25h  172.16.126.42   k8s-worker2   <none>  <none>

注意:若所有pod都已正常,但是确无法使用web访问,记得查看自己是否开启转发功能   特别强调所有Node都要配置 

$ cat /etc/sysctl.d/k8s.conf     #开启转发和网桥等功能
net.ipv4.ip_forward=1
vm.swappiness=0
vm.overcommit_memory=1
vm.panic_on_oom=0
net.bridge.bridge-nf-call-ip6tables = 1    #检测桥接流量
net.bridge.bridge-nf-call-iptables = 1

$ sysctl -p /etc/sysctl.d/k8s.conf
$ iptables -F  #清掉所有规则

6、配置prometheus

1. 打开浏览器


2.  查看针对Alertmanager所对应状态是否为up


3. 配置Grafana

 

prometheus监控java接口 prometheus监控pod_kubernetes_06

prometheus监控java接口 prometheus监控pod_prometheus监控java接口_07

prometheus监控java接口 prometheus监控pod_prometheus监控java接口_08

prometheus监控java接口 prometheus监控pod_容器_09

 

prometheus监控java接口 prometheus监控pod_运维_10

prometheus监控java接口 prometheus监控pod_kubernetes_11

 Dashboards | Grafana Labs

prometheus监控java接口 prometheus监控pod_运维_12

任意选一个点击,进去后copyID

prometheus监控java接口 prometheus监控pod_容器_13

prometheus监控java接口 prometheus监控pod_kubernetes_14

prometheus监控java接口 prometheus监控pod_云原生_15

prometheus监控java接口 prometheus监控pod_容器_16