1 需求背景

夜莺自带的告警模板,比较繁杂且一眼望去很难看出重要信息,这对故障的排查十分不利

夜莺运维指南之自定义告警模板_ide


现在需要开发出一款适合公司且比较简约的告警模板,更重要的是能够一眼能看出故障详情从而快速的排查故障。如

夜莺运维指南之自定义告警模板_运维_02

2 操作步骤

最简单的方法就是使用自定义脚本进行告警通知

夜莺运维指南之自定义告警模板_json_03


完整的Python脚本如下:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
"""
    代码功能: 定制飞书告警通知模板
"""
import datetime
import json
import requests
import sys # 导入 sys 模块,用于配置默认字符编码 
reload(sys) 
sys.setdefaultencoding('utf8') # 设置默认字符编码为 UTF-8,以确保处理 Unicode 数据时不会出现编码问题

class Sender(object):
    def __init__(self, payload):
        self.headers = {
            "Content-Type": "application/json"
        }
        self.users = payload.get('event').get('notify_users_obj') # 获取所有接收通知的用户
        self.is_recovered = payload.get('event').get('is_recovered') # 获取是否恢复
        self.content = payload.get('tpls').get("feishu", "feishu not found")
        self.hostname = payload.get('event').get('target_ident')
        self.color = "red"
        self.alert_headers = "夜莺监控异常告警"
        self.status_text = "触发时值: "
        if self.hostname.startswith("tps"):
            self.monitor_url = "http://domain(脱敏域名)/dashboards/12?datasource=1&ident={0}".format(self.hostname)
        

    def send_ifeishu(self, payload):
        tokens = {} # 获取所有飞书token
        for u in self.users: 
            """将字典的值赋值为1的目的是:收集唯一的键而不在意键对应的value"""            
            contacts = u.get('contacts') # 获取所有联系人
            if contacts == {}:
                continue
            if contacts.get("ifeishu_rebot_token",""): # 获取所有飞书token
                tokens[contacts.get("ifeishu_rebot_token","")] = 1 
        alert_content = ""

        for url in tokens:
            if "带宽" in self.content:
                bandwidth = payload.get('event').get('trigger_value')
                bandwith_MB =  "%.1f" %(float(bandwidth)/1024/1024)
                bind_content = self.status_text+bandwith_MB+"MB"
                alert_content = "告警对象: "+ self.hostname + "\n" + self.content + "\n" + bind_content + "\n" + "主要关注人: <at id=all></at>"
		    # 其他监控项已做脱敏处理
            if self.is_recovered:
                self.alert_headers = "夜莺监控恢复正常"
                self.color = "green"
                alert_content = "告警对象: "+ self.hostname +"\n"+ self.content

            message_body={
                "msg_type": "interactive",
                "card": {
                "config": {
                    "wide_screen_mode": True
                },
                "elements": [
                    {
                        "tag": "div",
                        "text": {
                            "content":alert_content,
                            "tag": "lark_md"
                        }
                    },
                    
            ],
                "header": {
                    "template": self.color, # 消息卡片主题颜色,可选值:red、orange、yellow、green、cyan、blue、purple、pink
                    "title": {
                        "content":self.alert_headers,
                        "tag": "plain_text"
                    }
                }
                }}
            if not self.is_recovered and "进程" not in self.content:
                button_elemnts = {
                    "tag": "action",
                    "actions": [
                        {
                        "tag": "button",
                        "text": {
                            "tag": "plain_text",
                            "content": "🔎 查看详情"
                        },
                        "type": "primary",
                        "multi_url": {
                            "url": self.monitor_url,
                            "pc_url": "", # 电脑端URL
                            "android_url": "", # 安卓端URL
                            "ios_url": "" # ios端URL
                        }
                    }
                ]}
                message_body["card"]["elements"].append(button_elemnts)
        response = requests.post(url, headers=self.headers, data=json.dumps(message_body))
    
if __name__ == '__main__':
    payload = json.load(sys.stdin) 
    sender = Sender(payload)
    sender.send_ifeishu(payload)
    with open('payload.json', 'w') as f:
        f.write(json.dumps(payload,indent=4)) # ident=4,表示每个嵌套级别的JSON数组都将会用4个空格缩进

这里注意,我定义了一个方法名称叫做send_ifeishu. 这个时候就需要添加一个通知媒介及联系方式

夜莺运维指南之自定义告警模板_运维_04


夜莺运维指南之自定义告警模板_ide_05


然后修改告警规则,选择其中一个告警规则,修改其通知媒介为ifeishu,告警接受组中添加一个飞书机器人,其中飞书机器人的联系方式修改为ifeishu_rebot_token: 飞书机器人通知链接

夜莺运维指南之自定义告警模板_json_06


然后修改通知模板(feishu)/新增通知模板(新增模板需要改上面的Python代码),去掉一些不需要的内容.然后保存退出

# 例如我这里修改飞书通知模板为:
级别状态: {{if .IsRecovered}}恢复正常{{else}}触发告警{{end}}
规则名称: {{.RuleName}}{{if .RuleNote}}{{end}}
{{if .IsRecovered}}恢复时间: {{timeformat .LastEvalTime}}{{else}}触发时间: {{timeformat .TriggerTime}}{{end}}

3 脚本调试

脚本调试步骤为,先让夜莺生成payload。脚本如下

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import sys
import json
import requests

class Sender(object):
    @classmethod
    def send_email(cls, payload):
        # already done in go code
        pass

    @classmethod
    def send_wecom(cls, payload):
        # already done in go code
        pass

    @classmethod
    def send_dingtalk(cls, payload):
        # already done in go code
        pass

    @classmethod
    def send_ifeishu(cls, payload):
        with open('/tmp/payload.json','w') as f:
            f.write(json.dumps(payload,indent=4))
    @classmethod
    def send_mm(cls, payload):
        # already done in go code
        pass

    @classmethod
    def send_sms(cls, payload):
        pass

    @classmethod
    def send_voice(cls, payload):
        pass

def main():
    payload = json.load(sys.stdin)
    with open(".payload", 'w') as f:
        f.write(json.dumps(payload, indent=4))
    for ch in payload.get('event').get('notify_channels'):
        send_func_name = "send_{}".format(ch.strip())
        if not hasattr(Sender, send_func_name):
            print("function: {} not found", send_func_name)
            continue
        send_func = getattr(Sender, send_func_name)
        send_func(payload)

def hello():
    print("hello nightingale")

if __name__ == "__main__":
    if len(sys.argv) == 1:        
        main()
    elif sys.argv[1] == "hello":
        hello()
    else:
        print("I am confused")

然后将此脚本放在通知脚本中,然后将告警规则修改为极易触发的内容.如mem_used_percent{ident=~'jde-server.*'} > 1 --->CPU利用率大于1的 然后生成/tmp/payload.json .

获得Payload后 ,使用python notify_feishu.py < /tmp/payload.json 这个命令进行调试.注意notify_feishu.py 是上面自己写的脚本

如果有自定义告警模板问题的,可以及时联系博主私信帮忙处理