2018.11.24
 之前在公司完成了通过NagiosXI→Nagflux→InfluxDB→Grafana的组合进行Nagios数据展示的搭建,所以把我在CentOS7的环境上搭建的过程超详细记录了一下,以备后续参考:虚拟机网络配置
由于公司做的网络策略,限制了大部分ip的外网权限,除了自己用的电脑外一时找不到外网ip,因此考虑用VMware的虚拟网络编辑器改为NAT模式来搭建,这样做的目的是可以用xshell等SSH客户端进行方便的虚拟机管理。
1.cd /etc/sysconfig/network-scripts //打开所在目录;
 2.编辑ifcfg-ens33这个文件(想要改成eth0也可以,在此不赘述),更改文件配置:ONBOOT=yes 、 BOOTPROTO=DHCP;
 3.service network restart;
 4.再用ip addr查看ip;
 5.用xshell进行SSH连接。其他准备工作
1.yum update //升级一下系统内核和包;
 2.sudo yum install lrzsz (由于在虚拟机中下载经常会出现卡顿,想先把包下到本地windows下再上传到CentOS中,若需要本地上传文件至Xshell可用rz -be);
 3.如果下载失败的包,用rm -f强制删除后重新下一般才可以。安装NagiosXI
1.cd /opt(这是我存放下载包的目录);
 2.用本地上传或下载包,一定选择官网(下载:wget https://assets.nagios.com/downloads/nagiosxi/xi-latest.tar.gz); 3.tar xzf xi-latest.tar.gz //解包;
4.cd nagiosxi
 5…/fullinstall
 6.注:直到出现Nagios XI Installation Complete!才算安装完成,中途有网络不可达的情况,多试几次理论上能连通;用systemctl status nagios.service 查看nagios运行状态,显示为Active: active (running)就可以了。安装Grafana
安装方法Grafana官网都有给出:
 Redhat & Centos(64 Bit)SHA256: 375e85339782cee09066267e3a6cd279d5ff71ce6c90a4ebcb9bd1c91de1d5c0wget https://dl.grafana.com/oss/release/grafana-5.3.4-1.x86_64.rpm sudo yum localinstall grafana-5.3.4-1.x86_64.rpm
Read the Centos / Redhat installation guide for more info. We also provide a YUM package repository.
 照做就可以了。安装InfluxDB
仍然按照Influx官网说明照做即可:
 wget https://dl.influxdata.com/influxdb/releases/influxdb-1.7.1.x86_64.rpm sudo yum localinstall influxdb-1.7.1.x86_64.rpm安装Nagflux
1.说明:由于Nagflux没有rpm安装,需要直接放到指定目录下
 2.mkdir /usr/local/nagflux //创建目录
3.cd /usr/local/nagflux/
 4.wget https://github.com/Griesbacher/nagflux/releases/download/v0.4.1/nagflux 5.chmod +x nagflux //添加可执行权限Nagios的配置和启动
这里我将配置过程拷贝进来以便参考:
 [root@localhost nagflux]# pwd
 /usr/local/nagflux
 [root@localhost nagflux]# mkdir nagfluxperfdata //创建目录用于保存Nagios性能数据
 [root@localhost nagflux]# chown nagios:nagios nagfluxperfdata //让Nagios有操作权限[root@localhost nagflux]# vim /usr/local/nagios/libexec/process-host-perfdata-file-bulk
 //创建一个脚本来拷贝Nagios主机性能数据,这里这么做是为了以后好扩展,
 如果我们还需要使用pnp4nagios,就可以在这个脚本中把数据复制一份,
 内容如下:
 #!/bin/bash/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagflux/nagfluxperfdata/${1}.perfdata.host
[root@localhost nagflux]# vim /usr/local/nagios/libexec/process-service-perfdata-file-bulk
 //再创建一个copy服务性能数据的脚本,
 内容如下:
 #!/bin/bash/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagflux/nagfluxperfdata/${1}.perfdata.service
//修改脚本的权限:
 [root@localhost nagflux]# chown nagios:nagios /usr/local/nagios/libexec/process-host-perfdata-file-bulk
 [root@localhost nagflux]# chmod +x /usr/local/nagios/libexec/process-host-perfdata-file-bulk
 [root@localhost nagflux]# chown nagios:nagios /usr/local/nagios/libexec/process-service-perfdata-file-bulk
 [root@localhost nagflux]# chmod +x /usr/local/nagios/libexec/process-service-perfdata-file-bulk//修改Nagios配置文件:
 [root@localhost nagflux]# vim /usr/local/nagios/etc/nagios.cfg //注:文件很长,可用Xshell查找功能#启用性能数据(默认情况不用改)
 process_performance_data=1#性能数据保存位置(默认情况不用改)
 host_perfdata_file=/usr/local/nagios/var/host-perfdata
 service_perfdata_file=/usr/local/nagios/var/service-perfdata#数据日志保存格式(这里不是默认的,请注意)
 host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$
 service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$#文件写入模式这里追加写入(默认情况不用改)
 host_perfdata_file_mode=a
 service_perfdata_file_mode=a#设定文件多长时间处理一次,如服务器压力大,时间就延长,单位秒(默认情况不用改)
 host_perfdata_file_processing_interval=15
 service_perfdata_file_processing_interval=15#文件处理命令,注意了这个名字不是我们创建的脚本名,虽然名字上看起来一样,这个需要在nagios的Command文件中定义
 host_perfdata_file_processing_command=process-host-perfdata-file-bulk
 service_perfdata_file_processing_command=process-service-perfdata-file-bulk//修改完成后,增加上面最后使用的两条命令:
 [root@localhost etc]# vim /usr/local/nagios/etc/commands.cfg //(不同版本Nagios目录不一样,无论哪个版本,找到commands.cfg即可)#‘process-host-perfdata-file-bulk’ command definition
define command{
command_name process-host-perfdata-file-bulk
command_line $USER1$/process-host-perfdata-file-bulk $TIMET$
}
#‘process-service-perfdata-file-bulk’ command definition
define command{
command_name process-service-perfdata-file-bulk
command_line $USER1$/process-service-perfdata-file-bulk $TIMET$
}
(注意:默认配置文件大约在601、615行处定义此命令)
 define command {
 command_name process-host-perfdata-file-bulk
 command_line /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host
 }
 define command {
 command_name process-service-perfdata-file-bulk
 command_line /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.service
 }(而原来的命令需删除避免因重复产生报错)
之后重启nagios进程:
 [root@localhost ~]# systemctl restart nagios稍等一分钟确认我们的目录下面是否有数据,有就OK,继续下一步,
 没有回头验证哪一步错误了,说明我这里看到有空文件,是因为我没有添加监控,就一个默认监控,
 空文件不会影响使用,可以加大文件处理的频率
 如:
 [root@localhost ~]# ll /usr/local/nagflux/nagfluxperfdata/
 总用量 12
 -rw-r–r--. 1 nagios nagios 228 11月 24 15:38 1543045099.perfdata.host
 -rw-r–r--. 1 nagios nagios 362 11月 24 14:55 1543045099.perfdata.service
 -rw-r–r-- 1 nagios nagios 327 11月 24 15:38 1543045113.perfdata.service
 -rw-r–r-- 1 nagios nagios 0 11月 24 15:38 1543045114.perfdata.host
 -rw-r–r-- 1 nagios nagios 0 11月 24 15:38 1543045129.perfdata.host
 -rw-r–r-- 1 nagios nagios 0 11月 24 15:38 1543045129.perfdata.service
 -rw-r–r-- 1 nagios nagios 0 11月 24 15:38 1543045144.perfdata.hostInfluxDB配置和启动
默认情况下不需要修改配置
 目录:
 [root@localhost ~]# cd /etc/influxdb
 [root@localhost influxdb]# systemctl start influxd
 [root@localhost influxdb]# influx //启动influxDB
 > create database Nagios_perfdata
 > show databases name: databases name——
_internal
 Nagios_perfdata> CREATE USER “admin” WITH PASSWORD ‘123456’ WITH ALL PRIVILEGES
 > quit
 [root@localhost influxdb]#补充,此后可以修改配置文件启用登录认证:
 vim /etc/influxdb/influxdb.conf[http]
enabled = true
bind-address = “:8086”
auth-enabled = true
之后再重启influxDB就可以了
 influx -username ‘admin’ -password ‘123456’Nagflux的配置和运行
[root@localhost influxdb]# cd /usr/local/nagflux //进入到我们的安装目录中
 [root@localhost nagflux]# mkdir Spool //创建目录存储nagflux临时缓存文件
 [root@localhost nagflux]# mkdir log //创建存储日志的目录
 [root@localhost nagflux]# vim config.gcfg //创建配置文件,按照我的方法安装的默认是没有这个文件
 #主文件处理配置[main]
#Nagios性能数据的位置,这里是我们脚本拷贝过来的目录
NagiosSpoolfileFolder = “/usr/local/nagflux/nagfluxperfdata”
NagiosSpoolfileWorker = 1
InfluxWorker = 2
MaxInfluxWorker = 5
#文件位置随便指定一个
DumpFile = “/usr/local/nagflux/nagflux.dump”
#这个是nagflux临时处理暂存目录
NagfluxSpoolfileFolder = “/usr/local/nagflux/Spool”
FieldSeparator = “&”
BufferSize = 1000
FileBufferSize = 65536
#保存目标我们使用Influxdb
DefaultTarget = “Influxdb”
#日志保存设置
[Log]
LogFile = “/usr/local/nagflux/log/nagflux.log”
MinSeverity = “INFO”
#ubfluxdb全局设置,主要是允许创建数据库
[InfluxDBGlobal]
CreateDatabaseIfNotExists = true
NastyString = “”
NastyStringToReplace = “”
HostcheckAlias = “hostcheck”
#指定数据库,NagiosPerfdata这里就是数据库名称,会自动创建的
[InfluxDB “NagiosPerfdata”]
Enabled = true
Version = 1.4.2
#influxdb的API接口地址,默认是8086端口
Address = “http://127.0.0.1:8086”
#这里指定用户名密码,必须有管理员权限
Arguments = “precision=ms&u=admin&p=**********&db=NagiosPerfdata”
StopPullingDataIfDown = true
#这里配置这个主要是排除日志中一直报的错误,如果没有livestatus,没有用途
[Livestatus]
Type = “tcp”
Version = “Icinga2”
[root@localhost nagflux]# ./nagflux
 //修该完成后手动启动测试,这个时候会一直停留在这里,所以需要另外启动一个Xshell窗口来登录数据库进行数据查询。数据查询及时间同步
另开一个Xshell窗口(因为nagflux程序不能中断)
 [root@localhost ~]# influx
 Connected to http://localhost:8086 version 1.7.1
 InfluxDB shell version: 1.7.1
 Enter an InfluxQL query
 > show databases
 name: databases
 name
 ——
 _internal
 Nagios_perfdata
 NagiosPerfdata //我们看到了新增的nagios数据
 > use NagiosPerfdata //使用此数据库
 Using database NagiosPerfdata
 > show measurements //查看此数据库的表
 name: measurements
 name
 ——
 metrics
 > select * from /.*/ limit 5 //查看是否有数据,如果数据比较多,使用limit进行限制
 name: metrics
 time command crit crit-fill host max min performanceLabel service unit value warn warn-fill
 ————————————————————————————————————————
 1543042556000000000 check_ping 60 none localhost 0 pl PING % 0 20 none
 1543042556000000000 check_ping 500 none localhost 0 rta PING ms 0.111 100 none
 1543045084000000000 check-host-alive 5000 none localhost 0 rta hostcheck ms 0.074 3000 none
 1543045084000000000 check-host-alive localhost rtmin hostcheck ms 0.051
 1543045084000000000 check-host-alive 100 none localhost pl hostcheck % 0 80 none注:


一、influxDB默认19位时间格式不方便看,
可通过直接运行influx -precision rfc3339进入influx shell转换成日期+时间的格式,
或是 在influx shell运行precision rfc3339;
二、通过select * from /./呈现的数据是正序,
因此如果要看到最新的数据只能一show到底,通过limit+数字 筛选出的只是最早的数据;
解决通过时间筛选数据目前我用的还只是加条件语句,
例如select * from /.
/ where time >=‘2018-09-25T03:09:54Z’ and time < ‘2018-09-25T03:11:11Z’
三、用select * from metrics order by time desc limit 5 tz(‘Asia/Shanghai’)倒序查看(顺手解决时区同步问题,详情请查询UTC时间)

将Nagflux设为系统服务项

[root@localhost ~]# cd /usr/lib/systemd/system/
 [root@localhost system]# vim nagflux.service
 [Unit]Description= Nagios per data insert Influx db
Documentation=https://github.com/Griesbacher/nagflux
After=network-online.target
[Service]
User=root
Group=root
ExecStart=/usr/local/nagflux/nagflux -configPath /usr/local/nagflux/config.gcfg
Restart=on-failure
[Install]
WantedBy=multi-user.target
Alias=nagflux.service //注意一定要写正确!否则会提示Failed to execute operation: Bad message
[root@localhost system]# vim nagflux.service
 [root@localhost system]# systemctl enable nagflux.service
 Created symlink from /etc/systemd/system/nagflux.service to /usr/lib/systemd/system/nagflux.service.
 Created symlink from /etc/systemd/system/multi-user.target.wants/nagflux.service to /usr/lib/systemd/system/nagflux.service.
 [root@localhost system]# systemctl start nagflux.service
 [root@localhost system]# systemctl status nagflux.service
 ● nagflux.service - Nagios per data insert Influx db
 Loaded: loaded (/usr/lib/systemd/system/nagflux.service; enabled; vendor preset: disabled)
 Active: active (running) since 六 2018-11-24 17:10:21 CST; 7s ago
 Docs: https://github.com/Griesbacher/nagflux Main PID: 42590 (nagflux)
 Tasks: 6
 CGroup: /system.slice/nagflux.service
 └─42590 /usr/local/nagflux/nagflux -configPath /usr/local/nagflux/config.gcfg11月 24 17:10:21 localhost.localdomain systemd[1]: Started Nagios per data insert Influx db.
 11月 24 17:10:21 localhost.localdomain systemd[1]: Starting Nagios per data insert Influx db…//这样就说明成功了
Grafana的启动及防火墙设置
配置文件在/etc/grafana/grafana.ini 默认情况下不用更改配置
启动相关服务:
 [root@localhost system]# systemctl start grafana-server.service
 [root@localhost system]# systemctl status grafana-server.service
 ● grafana-server.service - Grafana instance
 Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; vendor preset: disabled)
 Active: active (running) since 六 2018-11-24 17:16:40 CST; 21s ago
 Docs: http://docs.grafana.org Main PID: 44591 (grafana-server)
 Tasks: 8
 CGroup: /system.slice/grafana-server.service
 └─44591 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid cfg:default.paths.logs=/var/log/gra…11月 24 17:16:40 localhost.localdomain grafana-server[44591]: t=2018-11-24T17:16:40+0800 lvl=info msg=“Initializing InternalMetricsService” logger=server
 11月 24 17:16:40 localhost.localdomain grafana-server[44591]: t=2018-11-24T17:16:40+0800 lvl=info msg=“Initializing AlertingService” logger=server
 11月 24 17:16:40 localhost.localdomain grafana-server[44591]: t=2018-11-24T17:16:40+0800 lvl=info msg=“Initializing CleanUpService” logger=server
 11月 24 17:16:40 localhost.localdomain grafana-server[44591]: t=2018-11-24T17:16:40+0800 lvl=info msg=“Initializing NotificationService” logger=server
 11月 24 17:16:40 localhost.localdomain grafana-server[44591]: t=2018-11-24T17:16:40+0800 lvl=info msg=“Initializing ProvisioningService” logger=server
 11月 24 17:16:40 localhost.localdomain grafana-server[44591]: t=2018-11-24T17:16:40+0800 lvl=info msg=“Initializing RenderingService” logger=server
 11月 24 17:16:40 localhost.localdomain grafana-server[44591]: t=2018-11-24T17:16:40+0800 lvl=info msg=“Initializing TracingService” logger=server
 11月 24 17:16:40 localhost.localdomain systemd[1]: Started Grafana instance.
 11月 24 17:16:40 localhost.localdomain grafana-server[44591]: t=2018-11-24T17:16:40+0800 lvl=info msg=“Initializing Stream Manager”
 11月 24 17:16:40 localhost.localdomain grafana-server[44591]: t=2018-11-24T17:16:40+0800 lvl=info msg=“HTTP Server Listen” logger=http.server address=0…socket=
 Hint: Some lines were ellipsized, use -l to show in full.访问WEB控制台,默认端口3000,默认的用户名和密码都是admin
注意:通过netstat -tupln查到的是ipv4ipv6都监听的情况,可以不用理睬;
 tcp6 0 0 :::3000 ::: * LISTEN 2171/grafana-server[root@localhost ~]# systemctl disable firewalld
 [root@localhost ~]# systemctl stop firewalld
 \通过关闭防火墙检测是否是防火墙阻止了此服务或端口,验证生效,的确关闭后可通过web访问;
 [root@localhost ~]# firewall-cmd --zone=public --list-ports //列出开启的端口,发现无3000
 80/tcp 443/tcp 22/tcp 7878/tcp 162/udp
 [root@localhost ~]# firewall-cmd --zone=public --add-port=3000/tcp --permanent //添加3000端口
 success
 [root@localhost ~]# firewall-cmd --reload //重启防火墙
 success
 [root@localhost ~]# firewall-cmd --zone=public --list-ports //列出现在开启的端口,发现3000成功添加
 80/tcp 443/tcp 22/tcp 7878/tcp 162/udp 3000/tcp
 [root@localhost ~]# systemctl enable firewalld //重新设置开机启动防火墙
 Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
 Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
 再次通过WEB即可访问Grafana。

Grafana数据源设置

Name:Nagios(随便起)
 Type:InfluxDBHTTP
 URL:http://localhost:8086 Access:有代理服务器选代理,没有选默认服务器InfluxDB Details
 Database:NagiosPerfdata直到Save&Test通过

Grafana使用

新建dashboard之后add panel,这里以线形图为例,我们选择graph,面板会自动建好;
在新建的面板上方,下拉菜单中有edit选项,其中会有如下选项卡:
general、metrics、axes、legend、display、alert、time range选项
其中后面的图形选项之后再讨论,现在重点说一下metrics设置;

点开metrics菜单后,默认显示grafana提供的语句选项,
也可以在右侧下拉菜单中切换编辑模式为文字模式;
基本语句就是FROM__WHERE__
SELECT
GROUP BY必须得配合聚合函数来用
具体参考文档Grafana基础使用手册。