文章目录

  • 1、使用grok内置的正则案例
  • 2、使用grok自定义的正则案例
  • 3、filter插件通用字段案例(添加/删除字段、tag)
  • 4、date插件修改写入ES的时间案例
  • 5、geoip分析原IP地址位置案例
  • 6、useragent分析客户端的设备类型案例
  • 7、mutate组件常用案例
  • 8、logstash的多if分支案例
  • 附:



1、使用grok内置的正则案例

grok插件:
 Grok是将非结构化日志数据解析为结构化和可查询内容的好方法,底层原理是基于正则匹配任意文本格式
 此工具非常适合syslog日志、apache和其他Web服务器日志、mysql日志,以及一般来说,任何通常为人类而不是计算机消费编写的日志格式。
grok内置了120种匹配模式,也可以自定义匹配模式:
https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns

例如:自动解析并拆分nginx字段

##filebeat配置
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log

output.logstash:
  #指定logstash监听的IP和端口
  hosts: ["10.8.0.6:5044"]

[root@localhost ~]# cat >> /hqtbj/hqtwww/logstash_workspace/conf-logstash/09-stdin-grok-stout.conf << EOF
input {
  #监听的类型
  beats {
  #监听的本地端口
    port => 5044
  }
}

filter{
  grok{ 
   #match => { "message" => "%{COMBINEDAPACHELOG}" } 
   #上面的"COMBINEDAPACHELOG"变量官方github上已经废弃,建议使用下面的匹配模式 
   #参考地址:https://github.com/logstash-plugins/logstash-patterns-core/blob/main/patterns/legacy/httpd
   match => { "message" => "%{HTTPD_COMBINEDLOG}" }
  }

}

output {
  stdout {}

  elasticsearch {
    #定义es集群的主机地址
    hosts => ["10.8.0.2:9200","10.8.0.6:9200","10.8.0.9:9200"]
    #定义索引名称
    index => "hqt-application-pro-%{+YYYY.MM.dd}"
  }
}
EOF



2、使用grok自定义的正则案例

参考官网地址:
https://www.elastic.co/guide/en/logstash/7.17/plugins-filters-grok.html

测试数据(下面这是一条订单日志):

app_name:gotone-payment-api,client_ip:,context:,docker_name:,env:dev,exception:,extend1:,level:INFO,line:-1,log_message:com.gotone.paycenter.controller.task.PayCenterJobHandler.queryPayOrderTask-request:[\\],log_time:2022-11-23 00:00:00.045,log_type:applicationlog,log_version:1.0.0,本次成交的订单编号为:BEF25A72965,parent_span_id:,product_line:,server_ip:,server_name:gotone-payment-api-c86658cb7-tc8k5,snooper:,span:0,span_id:,stack_message:,threadId:104,trace_id:,user_log_type:

现需求需要把日志中的订单编号单独取出来:本次成交的订单编号为:BEF25A72965

grok的自定义正则为:

logstash中离线安装json_lines logstash插件_nginx


ORDER_ID正则的名称,下面grok匹配会用到

[\u4e00-\u9fa5]{10,11} 匹配10到11个汉字 "本次成交的订单编号为"[0-9A-F]{10,11} 匹配10到11个大写字母与数字"BEF25A72965"

中间加的":"不要忽略,这样才可以与需求匹配

[root@localhost ~]# cat >> /hqtbj/hqtwww/logstash_workspace/conf-logstash/10-stdin-grok_custom_patterns-stdout.conf << EOF
input {
 stdin {}
}

filter {
  grok {
    #指定模式匹配的目录,可以使用绝对路径
    #在./patterns目录下随便创建一文件,并写入以下匹配模式
    # ORDER_ID [\u4e00-\u9fa5]{10,11}:[0-9A-F]{10,11}
    patterns_dir => ["./patterns"]
    #匹配模式
    #测试数据为:app_name:gotone-payment-api,client_ip:,context:,docker_name:,env:dev,exception:,extend1:,level:INFO,line:-1,log_message:com.gotone.paycenter.controller.task.PayCenterJobHandler.queryPayOrderTask-request:[\\],log_time:2022-11-23 00:00:00.045,log_type:applicationlog,log_version:1.0.0,本次成交的订单编号为:BEF25A72965,parent_span_id:,product_line:,server_ip:,server_name:gotone-payment-api-c86658cb7-tc8k5,snooper:,span:0,span_id:,stack_message:,threadId:104,trace_id:,user_log_type:
    match => { "message" => "%{ORDER_ID:test_order_id}" }
  }
}

output {
  stdout {}
}
EOF

logstash中离线安装json_lines logstash插件_json_02

或将关键字"本次成交的订单编号为:"和订单编号"BEF25A72965"单独取出来

grok的自定义正则为:

logstash中离线安装json_lines logstash插件_nginx_03

input {
 stdin {}
}

filter {
  grok {
    #指定模式匹配的目录,可以使用绝对路径
    #在./patterns目录下随便创建一个文件,并写入以下匹配模式
    #SUCCESSFUL_ORDER [\u4e00-\u9fa5]{10,11}
    #ORDER_ID [0-9A-F]{10,11}
    patterns_dir => ["./patterns"]
    #匹配模式
    #测试数据为:app_name:gotone-payment-api,client_ip:,context:,docker_name:,env:dev,exception:,extend1:,level:INFO,line:-1,log_message:com.gotone.paycenter.controller.task.PayCenterJobHandler.queryPayOrderTask-request:[\\],log_time:2022-11-23 00:00:00.045,log_type:applicationlog,log_version:1.0.0,本次成交的订单编号为:BEF25A72965,parent_span_id:,product_line:,server_ip:,server_name:gotone-payment-api-c86658cb7-tc8k5,snooper:,span:0,span_id:,stack_message:,threadId:104,trace_id:,user_log_type:
    match => { "message" => "%{SUCCESSFUL_ORDER:successful_order_name}:%{ORDER_ID:test_order_id}" }
  }
}

output {
  stdout {} 
}

上面grok匹配内容中的":"千万不要忽略,需要完全匹配才可以进行grok,否则会报错,
如果需要匹配的数据"为本次成交的订单编号为 BEF25A72965"的话,
则需要改为"match => { "message" => "%{SUCCESSFUL_ORDER:successful_order_name} %{ORDER_ID:test_order_id}" }"

如果需要匹配的数据"为本次成交的订单编号为--->BEF25A72965"的话,

则需要改为"match => { "message" => "%{SUCCESSFUL_ORDER:successful_order_name}--->%{ORDER_ID:test_order_id}" }"

logstash中离线安装json_lines logstash插件_字段_04


3、filter插件通用字段案例(添加/删除字段、tag)

原有字段(nginx的json解析日志)

logstash中离线安装json_lines logstash插件_大数据_05

[root@localhost ~]# cat >> /hqtbj/hqtwww/logstash_workspace/conf-logstash/11-stdin-remove_add_field-stout.conf << EOF
input {
  beats {
    port => 5044
  }
}

filter {
  mutate {

    #移除指定的字段,使用逗号分隔
    remove_field => [ "tags","agent","input","log","ecs","version","@version","ident","referrer","auth" ]


    #添加指定的字段,使用逗号分隔
    #"%{clientip}"使用%可以将已有字段的值当作变量使用
    add_field => {
     "app_name" => "nginx"
     "test_clientip" => "clientip---->%{clientip}"
    }


    #添加tag
    add_tag => [ "linux","web","nginx","test" ]


    #移除tag
    remove_tag => [ "linux","test" ]

  }
}

output {
  stdout {}
}
EOF

执行案例后字段如下:

logstash中离线安装json_lines logstash插件_大数据_06


4、date插件修改写入ES的时间案例

测试日志:如下是我们要收集的一条json格式的日志

{"app_name":"gotone-payment-api","client_ip":"","context":"","docker_name":"","env":"dev","exception":"","extend1":"","level":"INFO","line":68,"log_message":"现代金控支付查询->调用入参[{}]","log_time":"2022-11-23 00:00:00.051","log_type":"applicationlog","log_version":"1.0.0","method_name":"com.gotone.paycenter.dao.third.impl.modernpay.ModernPayApiAbstract.getModernPayOrderInfo","parent_span_id":"","product_line":"","server_ip":"","server_name":"gotone-payment-api-c86658cb7-tc8k5","snooper":"","span":0,"span_id":"","stack_message":"","threadId":104,"trace_id":"gotone-payment-apib4a65777-ce6b-4bcc-8aef-71a7cfffaf2c","user_log_type":""}

logstash中离线安装json_lines logstash插件_大数据_07


logstash中离线安装json_lines logstash插件_字段_08


如上图所示,日志写入ES的时间与日志产生的时间不一致,当发生事故时,就会影响排错的效率,所以需要将这个写入ES的时间改为日志产生的时间log_time

[root@localhost ~]# cat >> /hqtbj/hqtwww/logstash_workspace/conf-logstash/12-stdin-date-es.conf << EOF
input {
  file {
    #指定收集的路径
    path => "/tmp/test.log"
  }
}


filter {

  json {
  #JSON解析器 可以将json形式的数据转换为logstash实际的数据结构(根据key:value拆分成字段形式)
    source => "message"
  }


  date {
    #匹配时间字段并解析
    match => [ "log_time", "yyyy-MM-dd HH:mm:ss.SSS" ]
    #将匹配到的时间字段解析后存储到目标字段,默认字段为"@timestamp"
    target => "@timestamp"
    timezone => "Asia/Shanghai"
  }

}

output {
  stdout {}

  elasticsearch {
    #定义es集群的主机地址
    hosts => ["10.8.0.2:9200","10.8.0.6:9200","10.8.0.9:9200"]
    #定义索引名称
    index => "hqt-application-pro-%{+YYYY.MM.dd}"
  }
}
EOF

执行案例后效果如下

logstash中离线安装json_lines logstash插件_字段_09


注意:下面这两种格式有区别!!

#第一种日志输出例如:
"timestamp":"22/Dec/2022:10:14:39 +0800"

date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    target => "@timestamp"
    timezone => "Asia/Shanghai"
  }

#第二种日志输出例如:
"AccessTime":"[22/Dec/2022:10:10:28 +0800]" 
其实这一种跟上一种是一回事,只不过是加了中括号,需要使用grok转义以下即可

grok {
    match => [ "message","%{HTTPDATE:AccessTime}" ]
  }
date {
    match => [ "AccessTime", "dd/MMM/yyyy:HH:mm:ss Z" ]
    target => "@timestamp"
    timezone => "Asia/Shanghai"
}



5、geoip分析原IP地址位置案例

测试数据为:nginx的json格式日志

{"@timestamp":"2022-12-18T03:27:10+08:00","host":"10.0.24.2","clientip":"114.251.122.178","SendBytes":4833,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"43.143.242.47","uri":"/index.html","domain":"43.143.242.47","xff":"-","referer":"-","tcp_xff":"-","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36","status":"200"}
[root@localhost ~]# cat >> /hqtbj/hqtwww/logstash_workspace/conf-logstash/13-beats-geoip-stdout.conf << EOF
input {
  file {
    #指定收集的路径
    path => "/tmp/test.log"
  }
}

filter {

  json {
  #JSON解析器 可以将json形式的数据转换为logstash实际的数据结构(根据key:value拆分成字段形式)
    source => "message"
  }


  geoip {
    #指定基于哪个字段分析IP地址
    source => "client_ip"
    #指定IP地址分析模块所使用的数据库,默认为GeoLite2-City.mmdb(这里必须再次指定以下,否则不会显示城市)
    database => "/hqtbj/hqtwww/logstash_workspace/data/plugins/filters/geoip/CC/GeoLite2-City.mmdb"
    #如果期望查看指定的字段,则可以在这里配置,若不配置,表示显示所有的查询字段
    #fields => ["city_name","country_name","ip"]
    #指定geoip的输出字段,当有多个IP地址需要分析时(例如源IP和目的IP),则该字段非常有效
    #target => "test-geoip-nginx"
  }

}

output {
  stdout {}
}
EOF

logstash中离线安装json_lines logstash插件_json_10


6、useragent分析客户端的设备类型案例

测试数据为:nginx的json格式日志

[root@localhost ~]# cat >> /hqtbj/hqtwww/logstash_workspace/conf-logstash/14-beats-useragent-stdout.conf << EOF
input {
  #监听的类型
  beats {
  #监听的本地端口
    port => 5044
  }
}


filter {

  #json {
  #JSON解析器 可以将json形式的数据转换为logstash实际的数据结构(根据key:value拆分成字段形式)
  #  source => "message"
  #}


  useragent {
    #指定客户端的设备相关信息字段
    source => "http_user_agent"
    #将分析的数据存储在一个指定的字段中,若不指定,则默认存储在target字段中
    target => "test-nginx-useragent"
  }

}

output {
  stdout {}
}
EOF

logstash中离线安装json_lines logstash插件_nginx_11


logstash中离线安装json_lines logstash插件_elasticsearch_12


7、mutate组件常用案例

mutate测试数据python脚本:

[root@localhost ~]# cat >> generate_log.py << EOF
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# @author : oldboyedu-linux80
import datetime
import random
import logging
import time
import sys
LOG_FORMAT = "%(levelname)s %(asctime)s [com.oldboyedu.%(module)s] - %(message)s "
DATE_FORMAT = "%Y-%m-%d %H:%M:%S"
# 配置root的logging.Logger实例的基本配置
logging.basicConfig(level=logging.INFO, format=LOG_FORMAT, datefmt=DATE_FORMAT, filename=sys.argv[1], filemode='a',)
actions = ["浏览⻚⾯", "评论商品", "加⼊收藏", "加⼊购物⻋", "提交订单", "使⽤优惠券", "领取优惠券", "搜索", "查看订单", "付款", "清空购物⻋"]
while True: 
    time.sleep(random.randint(1, 5))
    user_id = random.randint(1, 10000)
# 对⽣成的浮点数保留2位有效数字.
    price = round(random.uniform(15000, 30000),2)
    action = random.choice(actions)
    svip = random.choice([0,1])
    logging.info("DAU|{0}|{1}|{2}|{3}".format(user_id, action,svip,price))
EOF
[root@elk02 ~]# python generate_log.py  /tmp/app.log

生成至logstash的数据如下

logstash中离线安装json_lines logstash插件_大数据_13

mutate案例:

[root@localhost ~]# cat >> /hqtbj/hqtwww/logstash_workspace/conf-logstash/15-beats-mutate-stdout.conf << EOF
input {
  #监听的类型
  beats {
  #监听的本地端口
    port => 5044
  }
}


filter {

  #json {
  #JSON解析器 可以将json形式的数据转换为logstash实际的数据结构(根据key:value拆分成字段形式)
  #  source => "message"
  #}

   mutate {
     #对"message"字段使用"|"进行切分
     split => { "message" => "|" }
   }

   mutate {
     #添加字段,其中引用到了变量
     add_field => {
       "user_id" => "%{[message][1]}"
       "action" => "%{[message][2]}"
       "svip" => "%{[message][3]}"
       "price" => "%{[message][4]}"
     }
   }

   mutate {
     #将指定的字段转换为相应的数据类型
     #integer将解析字符串例如 "1000" 转换为1000
     #boolean将整数0转换为false,整数1转换为true
     #float将保留小数,例如"1000.15" 转换为1000.15
     convert => {
     "user_id" => "integer"
     "svip" => "boolean"
     "price" => "float"
     }
   }

   mutate {
     #将"price"字段内容拷贝到"test-mutate-price"中
     copy => { "price" => "test-mutate-price" }
   }

   mutate {
     #修改字段的名称
     # Renames the 'HOSTORIP' field to 'client_ip'
     rename => { "svip" => "test-mutate-ssvip" }
   }

   mutate {
    #替换字段的内容
    replace => { "message" => "%{message}: My new message" }
    #replace => { "message" => "My new message" }
   }

   mutate {
     #将指定字段的内容全部大写
     uppercase => [ "message" ]
   }

}

output {
  stdout {}
  }
}
EOF

执行案例效果如下

logstash中离线安装json_lines logstash插件_字段_14


8、logstash的多if分支案例

[root@localhost ~]# cat >> /hqtbj/hqtwww/logstash_workspace/conf-logstash/16-homework-to-es.conf << EOF
input {
  beats {
    type => "test-nginx-applogs"
    port => 5044
  } 
  file {
    type => "test-product-applogs"
    path => "/tmp/app.logs"
  }
  beats {
    type => "test-dw-applogs"
    port => 8888
  }
  file { 
    type => "test-payment-applogs"
    path => "/tmp/payment.log"
  } 
}


filter {
  if [type] == "test-nginx-applogs"{
    mutate {
      remove_field => [ "tags","agent","input","log","ecs","version","@version","ident","referrer","auth","xff","referer","upstreamtime","upstreamhost","tcp_xff"]
    }
    geoip {
     source => "clientip"
     database => "/hqtbj/hqtwww/logstash_workspace/data/plugins/filters/geoip/CC/GeoLite2-City.mmdb"
    }
    useragent {
     source => "http_user_agent"
    }
  } 

  if [type] == "test-product-applogs" {
    mutate {
     split => { "message" => "|" }
    }
    mutate {
      add_field => {
        "user_id" => "%{[message][1]}"
        "action" => "%{[message][2]}"
        "svip" => "%{[message][3]}"
        "price" => "%{[message][4]}"
      }
    }
    mutate {
      convert => {
      "user_id" => "integer"
      "svip" => "boolean"
      "price" => "float"
      }
    }
  } 

  if [type] in [ "test-dw-applogs","test-payment-applogs" ] {
    json {
      source => "message"
    }
    date {
      match => [ "log_time", "yyyy-MM-dd HH:mm:ss.SSS" ]
      target => "@timestamp"
    }
  }
}


output {
  stdout {}
  if [type] == "test-nginx-applogs" { 
    elasticsearch {
      hosts => ["10.8.0.2:9200","10.8.0.6:9200","10.8.0.9:9200"]
      index => "test-nginx-logs-%{+YYYY.MM.dd}" 
    }
  }

  if [type] == "test-product-applogs" {
    elasticsearch {
      hosts => ["10.8.0.2:9200","10.8.0.6:9200","10.8.0.9:9200"]
      index => "test-product-applogs-%{+YYYY.MM.dd}"    
    }
  }

  if [type] in [ "test-dw-applogs","test-payment-applogs" ] {
    elasticsearch {
      hosts => ["10.8.0.2:9200","10.8.0.6:9200","10.8.0.9:9200"]
      index => "test-center-applogs-%{+YYYY.MM.dd}"
    }
  }
}
EOF





附:

date插件修改写入ES的时间nginx案例
前置条件:
(1)此处案例nginx没有配置json格式日志,使用的是grok内置的正则来解析字段的
(2)filebeat处不需要配置json解析,logstash也不需要配置filter的json解析

#filebeat配置
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log

output.logstash:
  #指定logstash监听的IP和端口
  hosts: ["10.8.0.6:5044"]


#logstash配置
input {
  #监听的类型
  beats {
  #监听的本地端口
    port => 5044
  }
}

filter{
  grok{
   #match => { "message" => "%{COMBINEDAPACHELOG}" }
   #上面的"COMBINEDAPACHELOG"变量官方github上已经废弃,建议使用下面的匹配模式
   #参考地址:https://github.com/logstash-plugins/logstash-patterns-core/blob/main/patterns/legacy/httpd
   match => { "message" => "%{HTTPD_COMBINEDLOG}" }
  }

  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    target => "@timestamp"
    timezone => "Asia/Shanghai"
  }

}

output {
  stdout {}

  elasticsearch {
    #定义es集群的主机地址
    hosts => ["10.8.0.2:9200","10.8.0.6:9200","10.8.0.9:9200"]
    #定义索引名称
    index => "hqt-application-pro-%{+YYYY.MM.dd}"
  }
}

date插件修改写入ES的时间tomcat案例
前置条件:
(1)tomcat需要配置json格式日志
(2)filebeat处不需要配置json解析,logstash也不需要配置filter的json解析

#filebeat配置
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /hqtbj/hqtwww/tomat_workspace/tomcat/logs/localhost_access_log.2022-12-26.txt

output.logstash:
  #指定logstash监听的IP和端口
  hosts: ["10.8.0.6:8888"]


#logstash配置
input {
  beats{
    port => 8888
  }
}

filter {
  grok {
    match => [ "message","%{HTTPDATE:AccessTime}" ]
  }

  date {
    match => [ "AccessTime", "dd/MMM/yyyy:HH:mm:ss Z" ]
    target => "@timestamp"
    timezone => "Asia/Shanghai"
  }
}

output {
  stdout {}
  elasticsearch {
    hosts => ["10.8.0.2:9200","10.8.0.6:9200","10.8.0.9:9200"]
    index => "test-tomcat-logs-%{+YYYY.MM.dd}"
  }
}