filebeat采集 java日志 filebeat日志收集

转载

西门吹雪 2023-08-21 21:13:32

文章标签 filebeat采集 java日志 filebeat redis nginx kafka 文章分类 Java 后端开发

以轻量化方式收集、解析和传输数据。
Beats 平台集合了多种单一用途数据采集器。
它们从成百上千或成千上万台机器和系统向 Logstash 或 Elasticsearch 发送数据。

1.安装部署

tar zxvf filebeat-7.8.0-linux-x86_64.tar.gz
ln -s filebeat-7.8.0-linux-x86_64 filebeat

2.配置文件

文档地址：https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html

cp filebeat.yml filebeat-backup.yml

vim filebeat.yml

filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access.log

#输出到控制台  
output.console:
  enabled: true

backoff:设定多长时间检查文件更新。
tail_files:如果设置为true，则Filebeat将在每个文件的末尾而不是开头读取新文件。当此选项与日志轮换结合使用时，可能会跳过新文件中的第一个日志条目。
#输出到es

filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access.log
#输出到es
output.elasticsearch:
  hosts: ["localhost:9200"]

3.启动filebeat

#查看启动参数
./filebeat --help
#删除上一次日志加载地址
cd data
rm -rf *
#列出日志数量
cd /usr/local/nginx/logs
cat access.log|wc -l
#前台启动
./filebeat -e -c filebeat.yml
#后台启动
vi startup.sh
#! /bin/bash
nohup /usr/local/filebeat/filebeat -e -c filebeat.yml >> /usr/local/filebeat/output.log 2>&1 &
chmod a+x startup.sh

4.filebeat+logstash采集日志

logstash 和filebeat都具有日志收集功能，filebeat更轻量，占用资源更少，但logstash 具有filter功能，能过滤分析日志。一般结构都是filebeat采集日志，然后发送到消息队列，redis， kafaka。然后logstash去获取，利用filter功能过滤分析，然后存储到elasticsearch中。
架构图

这种架构解决了 Logstash 在各服务器节点上占用系统资源高的问题。相比 Logstash，Beats 所占系统的 CPU 和内存几乎可以忽略不计。
配置文件

#filebeat配置,filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access.log
#输出到logstash
output.logstash:
  enabled: true
  hosts: ["localhost:5044"]

#logstash配置,logstash.conf
#使用logstash-input-beats插件
#会开启5044端口
input {
    beats {
      host => "0.0.0.0"
      port => 5044
    }
}

filter {
    grok {  
      match => { "message" => "%{HTTPD_COMBINEDLOG}" }
    }
    date {
      match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
}

output {
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
}

#logstash移除不必要的字段
input {
    beats {
      host => "0.0.0.0"
      port => 5044
    }
}

filter {
    grok {  
      match => { "message" => "%{HTTPD_COMBINEDLOG}" }
    }
    date {
      match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
    mutate{
       remove_field => ["agent"]
    }
}

output {
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
}

5.filebeat采集json格式的日志数据

修改nginx的日志为json格式

#nginx访问日志
192.168.230.110 - - [29/Aug/2020:12:50:21 +0800]
"GET /abc/abc2.txt HTTP/1.1" 404 555 "-"
"Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/73.0.3683.86 Safari/537.36"

#修改nginx.conf
log_format log_json '{"remote_addr":"$remote_addr", '
                    '"ident": "-", '
                    '"user": "$remote_user", '
                    '"timestamp": "$time_local",'
                    '"request": "$request", '
                    '"status": $status, '
                    '"bytes": $body_bytes_sent, '
                    '"referer": "$http_referer",'
                    '"agent": "$http_user_agent",'
                    '"x_forwarded":"$http_x_forwarded_for"'
                    ' }';
access_log logs/access-json.log log_json;

#检查配置格式是否正确
sbin/nginx -t -c conf/nginx.conf

输出: 
nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful

#开启nginx多线程
worker_processes  4;

#重新加载配置文件
sbin/nginx -s reload

filebeat配置文件

#filebeat配置,filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access-json.log
#输出到logstash
output.logstash:
  enabled: true
  hosts: ["localhost:5044"]

logstash配置

input {
    beats {
      host => "0.0.0.0"
      port => 5044
    }
}

filter {
    json {
       source => "message"
       remove_field => ["agent"]
    }
    date {
      match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
}

output {
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
}

6.Filebeat同时采集多个日志

filebeat配置文件

#filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access-json.log
    fields:
      filetype: logjson
    fields_under_root: true
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /var/log/messages
    fields:
      filetype: logsystem
    fields_under_root: true
output.logstash:
  enabled: true
  hosts: ["localhost:5044"]

fields:自定义字段

fields_under_root:为true,则自定义字段将为文档中的顶级字段。

logstash配置

#logstash.conf
input {
    beats {
      host => "0.0.0.0"
      port => 5044
    }
}
filter {
    if [filetype] == "logjson" {
        json {
          source => "message"
          remove_field => ["agent","beat","offset","tags","prospector"]
        }
        date {
          match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
          target => "@timestamp"
        }
    }
}
output {
    if [filetype] == "logjson" {
        elasticsearch {
            hosts => ["127.0.0.1:9200"]
            index => "nginx-%{+YYYY.MM.dd}"
        }
    } else if [filetype] == "logsystem" {
        elasticsearch {
          hosts => ["127.0.0.1:9200"]
          index => "msg-%{+YYYY.MM.dd}"
        }
    }
}

注意：filter过滤的字段只能越来越少，不能增加

否则报：[ElasticSearch MapperParsingException object mapping](https://stackoverflow.com/questions/23605942/elasticsearch-mapperparsingexception-object-mapping)错误

7.Filebeat+Redis+Logstash收集日志数据

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FJq7DxTd-1614573143656)(C:\Users\ASUS\AppData\Roaming\Typora\typora-user-images\image-20210226101325307.png)]$

当logstash宕机的时候，这时候filebeat就不能往logstash里写数据了，这期间的日志信息可能就无法采集到，因此一般都会采用redis或kafka作为一个消息缓冲层。logstash去消费数据写至es。
安装Redis

tar zxvf redis-5.0.11.tar.gz
make
make install

#初始化redis:
./utils/install_server.sh

Please select the redis port for this instance: [6379] 
Selecting default: 6379
Please select the redis config file name [/etc/redis/6379.conf] 
Selected default - /etc/redis/6379.conf
Please select the redis log file name [/var/log/redis_6379.log] 
Selected default - /var/log/redis_6379.log
Please select the data directory for this instance [/var/lib/redis/6379] 
Selected default - /var/lib/redis/6379
Please select the redis executable path [/usr/local/bin/redis-server] 
Selected config:
Port           : 6379
Config file    : /etc/redis/6379.conf
Log file       : /var/log/redis_6379.log
Data dir       : /var/lib/redis/6379
Executable     : /usr/local/bin/redis-server
Cli Executable : /usr/local/bin/redis-cli
Is this ok? Then press ENTER to go on or Ctrl-C to abort.
Copied /tmp/6379.conf => /etc/init.d/redis_6379
Installing service...
Successfully added to chkconfig!
Successfully added to runlevels 345!
Starting Redis server...
Installation successful!

#检查配置
chkconfig --list

#检查redis-cli
[root@elk utils]# which redis-cli 
/usr/local/bin/redis-cli
	
#修改配置:
vi /etc/redis/6379.conf 

bind 0.0.0.0
port 6379
daemonize yes
logfile /var/log/redis_6379.log
dir /var/lib/redis/6379

#重启redis
kill -9 16101
rm /var/run/redis_6379.pid
service redis_6379 start
systemctl enable redis_6379(chkconfig redis_6379 on
)

#进入redis
redis-cli

filebeat配置文件

#filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access-json.log
    fields:
      filetype: nginxjson
    fields_under_root: true
#输出到redis
output.redis:
  enabled: true
  hosts: ["127.0.0.1:6379"]
  key: nginx
  db: 0
  datatype: list

logstash配置

input {
    redis {
      host => "127.0.0.1"
      port => 6379
      key => "nginx"
      data_type => "list"
      db => 0
    }
}

filter {
    json {
       source => "message"
       remove_field => ["agent","beat","offset","tags","prospector"]
    }
    date {
      match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
}

output {
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
}

8.Filebeat+Kafka+Logstash 收集日志数据

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ZplioINc-1614573143657)(C:\Users\ASUS\AppData\Roaming\Typora\typora-user-images\image-20210226142514196.png)]$

安装kafka

tar zxvf kafka_2.13-2.7.0.tgz
mv kafka_2.13-2.7.0 /usr/local
ln -s kafka_2.13-2.7.0 kafka

启动zookeeper
Kafka使用ZooKeeper，所以需要先启动一个ZooKeeper服务器。

bin/zookeeper-server-start.sh config/zookeeper.properties

后台启动

vi start-zk.sh
#! /bin/bash
nohup /usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties >> /usr/local/kafka/zk-output.log 2>&1 &
chmod a+x start-zk.sh

Kafka基本配置

vim config/server.properties
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://192.168.122.150:9092

启动Kafka

bin/kafka-server-start.sh config/server.properties

后台启动

bin/kafka-server-start.sh -daemon config/server.properties

创建Topic

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic fx-topic

查看topic列表

bin/kafka-topics.sh --list --zookeeper localhost:2181

启动生产者

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic fx-topic

启动消费者

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic fx-topic --from-beginning

filebeat配置
https://www.elastic.co/guide/en/beats/filebeat/current/kafka-output.html

#filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access-json.log
    fields:
      filetype: nginxjson
    fields_under_root: true
#输出到kafka
output.kafka:
  hosts: ["localhost:9092"]
  topic: fx-topic
  required_acks: 1

logstash配置

input {
    kafka {
      bootstrap_servers => "127.0.0.1:9092"
      topics => "fx-topic"
      group_id => "logstash"
    }
}

filter {
    json {
       source => "message"
       remove_field => ["agent","beat","offset","tags","prospector"]
    }
    date {
      match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
}

output {
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
}

group_id
消费者分组，可以通过组 ID 去指定，不同的组之间消费是相互不受影响的，相互隔离。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：所有数据都适合放入数据仓库吗所有数据信息都保存在

下一篇：sql server 2018客户端的刷新本地缓存在那 sql数据库刷新数据

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯