安装目录:
10.10.10.10 | prometheus、Grafana | 9090、3000 |
10.10.10.11 | node_exporter | 9010 |
10.10.10.12 | node_exporter | 9010 |
10.10.10.13 | node_exporter | 9010 |
1.prometheus官网 (容器部署:GitHub - techiescamp/kubernetes-prometheus: Kubernetes Manifest files for setting up Prometheus monitoring on the Kubernetes cluster.)
wget https://github.com/prometheus/prometheus/releases/download/v2.42.0/prometheus-2.42.0.linux-amd64.tar.gz
2.Grafana官网
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.1.1.linux-amd64.tar.gz
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.1.1-1.x86_64.rpm
3.node_exporter下载
wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
一.prometheus安装
[root@localhost ~]# tar -zxvf prometheus-2.42.0.linux-amd64.tar.gz -C /usr/local/
[root@localhost ~]# cd /usr/local/
[root@localhost local]# mv prometheus-2.42.0.linux-amd64 prometheus
[root@localhost local]# cd prometheus/
[root@localhost prometheus]# ./prometheus --config.file=prometheus.yml &
[root@localhost prometheus]# ps -ef |grep prometheus
root 34154 33916 2 17:30 pts/0 00:00:00 ./prometheus --config.file=prometheus.yml
root 34164 33916 0 17:30 pts/0 00:00:00 grep --color=auto prometheus
访问:http://10.10.10.10:9090 (prometheus默认端口是9090)
查看监控机器:Status —Targets,目前只监控了本机
添加监控机器
找到prometheus的安装目录下的配置文件prometheus.yml,在配置文件中添加agent机器ip+端口
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
#添加被监控机器
- job_name: 'agent'
static_configs:
- targets: ['10.10.10.11:9100']
重启prometheus
访问:http://10.10.10.10:9090 查看是否有被监控的agent机器
二.安装node_exporter
[root@localhost ~]# ls
anaconda-ks.cfg node_exporter-1.5.0.linux-amd64.tar.gz
[root@localhost ~]# tar -zxvf node_exporter-1.5.0.linux-amd64.tar.gz -C /usr/local/
node_exporter-1.5.0.linux-amd64/
node_exporter-1.5.0.linux-amd64/LICENSE
node_exporter-1.5.0.linux-amd64/NOTICE
node_exporter-1.5.0.linux-amd64/node_exporter
[root@localhost ~]# cd /usr/local/
[root@localhost local]# ls
bin conf etc games include jdk1.8.0_333 lib lib64 libexec mysql nginx node_exporter-1.5.0.linux-amd64 sbin share src zabbix_agent
[root@localhost local]# mv node_exporter-1.5.0.linux-amd64/ node_exporter
[root@localhost local]# cd node_exporter/
[root@localhost node_exporter]# ls
LICENSE node_exporter NOTICE
[root@localhost node_exporter]# /usr/local/node_exporter/node_exporter --web.listen-address=0.0.0.0:9100 &
[root@localhost node_exporter]# ps -ef |grep node_exporter
root 11202 10682 0 17:54 pts/1 00:00:00 /usr/local/node_exporter/node_exporter --web.listen-address=0.0.0.0:9100
root 11213 10682 0 17:54 pts/1 00:00:00 grep --color=auto node_exporter
1.node_exporter的默认端口是9100,上面已经提前配置10.10.10.10这个台机的监控
2.使用"./node_exporter"启动node_exporter运行一段时间会停掉:
使用启动命令:"/usr/local/node_exporter/node_exporter --web.listen-address=0.0.0.0:9100 &"
或者:"nohup ./node_exporter &"
三.Grafana安装
[root@localhost ~]# rpm -ivh grafana-enterprise-10.1.1-1.x86_64.rpm
警告:grafana-enterprise-10.1.1-1.x86_64.rpm: 头V4 RSA/SHA256 Signature, 密钥 ID 24098cb6: NOKEY
错误:依赖检测失败:
urw-fonts 被 grafana-enterprise-10.1.1-1.x86_64.rpm 需要
[root@localhost ~]# yum install -y urw-fonts
或者忽视依赖安装:
[root@localhost ~]# rpm -ivh grafana-enterprise-10.1.1-1.x86_64.rpm --nodeps --force
警告:grafana-enterprise-10.1.1-1.x86_64.rpm: 头V4 RSA/SHA256 Signature, 密钥 ID 24098cb6: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:grafana-enterprise-10.1.1-1 ################################# [100%]
### NOT starting on installation, please execute the following statements to configure grafana to start automatically using systemd
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable grafana-server.service
### You can start grafana-server by executing
sudo /bin/systemctl start grafana-server.service
POSTTRANS: Running script
启动:
[root@localhost ~]# systemctl start grafana-server
[root@localhost ~]# ps -ef |grep grafana
grafana 34191 1 5 18:00 ? 00:00:00 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --packaging=rpm cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins cfg:default.paths.provisioning=/etc/grafana/provisioning
root 34201 33916 0 18:01 pts/0 00:00:00 grep --color=auto grafana
访问:http://10.10.10.10:3000(默认端口是3000)初始用户名和密码都是admin
如果Grafana密码忘记了,可以使用命令重置:
grafana-cli admin reset-admin-password admin233%#
grafana插件安装:
插件地址:DevOpsProdigy KubeGraf plugin for Grafana | Grafana Labs
grafana插件默认路径:/var/lib/grafana/plugins
[root@master1 ~]# cd /var/lib/grafana/
[root@master1 grafana]# ll
总用量 3708
drwxr-x--- 3 grafana grafana 15 2月 6 2023 alerting
drwx------ 2 grafana grafana 6 2月 6 2023 csv
-rw-r----- 1 grafana grafana 3796992 1月 19 14:35 grafana.db
drwx------ 2 grafana grafana 6 2月 6 2023 png
如果没有plugins这个目录,需要手动创建:
[root@master1 grafana]# mkdir plugins
[root@master1 grafana]# cd plugins/
我安装的插件是:DevOpsProdigy KubeGraf 用于监控k8s集群。
命令安装:
grafana-cli plugins install devopsprodigy-kubegraf-app
或者手动下载zip插件包,上传到插件目录,然后解压:
[root@master1 plugins]# ls
devopsprodigy-kubegraf-app-1.5.2.zip
[root@master1 plugins]# unzip devopsprodigy-kubegraf-app-1.5.2.zip
[root@master1 plugins]# ls
devopsprodigy-kubegraf-app devopsprodigy-kubegraf-app-1.5.2.zip
解压完成后需要重启grafana:
[root@master1 plugins]# systemctl restart grafana-server
[root@master1 plugins]# ps -ef |grep grafana
grafana 41618 1 71 14:49 ? 00:00:04 /usr/share/grafana/bin/grafana server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --packaging=rpm cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins cfg:default.paths.provisioning=/etc/grafana/provisioning
root 41652 126139 0 14:49 pts/0 00:00:00 grep --color=auto grafana
重启完成后可以到grafana的插件界面查看:
注意:使用DevOpsProdigy KubeGraf需要现在安装kube-state-metrics
插件使用:
添加数据源:
Kubernetes api 服务器的 http 访问设置:
- Kubernetes master 的 url 来自
kubectl cluster-info
[root@master1 ~]# kubectl cluster-info
Kubernetes control plane is running at https://10.10.10.10:6443
CoreDNS is running at https://10.10.10.10:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
令牌获取:
cat ~/.kube/config
或者
cat /etc/kubernetes/admin.conf
将三个证书进行解码:
echo "证书内容"|base64 -d
配置完成后点击保存并测试:
官方安装参考:
安装
- 进入Grafana中的插件目录:
cd $GRAFANA_PATH/data/plugins
- 复制存储库:
git clone https://github.com/devopsprodigy/kubegraf /var/lib/grafana/plugins/devopsprodigy-kubegraf-app
并重新启动grafana-server
或者grafana-cli plugins install devopsprodigy-kubegraf-app
并重新启动grafana服务器。 - 创建命名空间“kubegraf”并应用kubernetes/目录中的 Kubernetes 清单以向用户提供所需的权限
grafana-kubegraf
:
kubectl create ns kubegraf
kubectl apply -f https://raw.githubusercontent.com/devopsprodigy/kubegraf/master/kubernetes/serviceaccount.yaml
kubectl apply -f https://raw.githubusercontent.com/devopsprodigy/kubegraf/master/kubernetes/clusterrole.yaml
kubectl apply -f https://raw.githubusercontent.com/devopsprodigy/kubegraf/master/kubernetes/clusterrolebinding.yaml
kubectl apply -f https://raw.githubusercontent.com/devopsprodigy/kubegraf/master/kubernetes/secret.yaml
grafana-kubegraf
在其中一个主节点上创建用户私钥和证书:
openssl genrsa -out ~/grafana-kubegraf.key 2048
openssl req -new -key ~/grafana-kubegraf.key -out ~/grafana-kubegraf.csr -subj "/CN=grafana-kubegraf/O=monitoring"
openssl x509 -req -in ~/grafana-kubegraf.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -out /etc/kubernetes/pki/grafana-kubegraf.crt -CAcreateserial
将 /etc/kubernetes/pki/grafana-kubegraf.crt 复制到所有其他主节点。
或者
获取令牌
kubectl get secret grafana-kubegraf-secret -o jsonpath={.data.token} -n kubegraf | base64 -d
- 转到 Grafana 中的 /configuration-plugins 并单击该插件。然后点击“启用”。
- 转到插件并选择“创建集群”。
- 输入 Kubernetes api 服务器的 http 访问设置:
- Kubernetes master 的 url 来自
kubectl cluster-info
- 输入步骤 #4“TLS 客户端身份验证”部分中的证书和密钥或“不记名令牌访问”部分中步骤 #4 中的令牌
- 打开“其他数据源”下拉列表并选择该集群中使用的 Prometheus。
设置简体中文:
在Grafana中配置prometheus
1.添加数据源
查看添加的数据源 :
导入官方模板:Dashboards | Grafana Labs
导入别人的模板:
别人的模板导出
别人的模板导入
导入别人的模板后报错:
Failed to upgrade legacy queries Datasource D3UDCoh4z was not found
解决:参考
{
...........................
#修改前
"datasource": {
"type": "prometheus",
"uid": "xxxx-uid"
},
#所有的都修改成这样
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
...........................
}