Zabbix 监控进程宕机
业务需求后端进程宕机以后能在短时间内迅速拉起,业务影响不大,但是开发需要查看coredump,要求能监控到pid变化;在现有构架下zabbix能监控并报警;
当然zabbix设置报警设置就不再一一
在每台服务器/etc/zabbix/zabbix_agentd.conf设置路径:此例只需要piddiff.sh
UserParameter=checkpid,sh /usr/local/script/piddiff.sh
UserParameter=test,sh /usr/local/script/test.sh
UserParameter=discovery.process,/usr/local/script/disprocess.sh
UserParameter=process.check[*],/usr/local/script/proc_check.sh $1 $2 $3
/usr/local/script下面存放脚本
Vim piddiff.sh
aapid为业务监控id 取值根据业务需求;
#/bin/sh
onl_ok=1
onl_cored=3
dir=/usr/local/script
if [[ ! -f "$dir/old.txt" ]];then
ps aux|grep aapid |grep -v grep|grep -v /bin/bash|awk '{print $2,$11}' > $dir/old.txt
else
sleep 1s
fi
ps aux|grep aapid |grep -v grep|grep -v /bin/bash|awk '{print $2,$11}' > $dir/now.txt
if ! diff -q $dir/old.txt $dir/now.txt > /dev/null; then
echo $onl_cored
diff -c $dir/old.txt $dir/now.txt > $dir/`date "+%Y%m%d%H%M"`_diff.txt
cat $dir/now.txt >$dir/old.txt
else
echo $onl_ok
fi
一个简单的判断脚本;
Zabbix30秒会抓取一次,正常没变化为1,有变化为3,那么zabbix抓取数值为3则表示pid有变化,会发出警报;
Zabbix设置:
监控项模板添加如下:
触发器:{Template OS Linux:checkpid.last()}=3