- 以轻量化方式收集、解析和传输数据。
- Beats 平台集合了多种单一用途数据采集器。
- 它们从成百上千或成千上万台机器和系统向 Logstash 或 Elasticsearch 发送数据。
1.安装部署
tar zxvf filebeat-7.8.0-linux-x86_64.tar.gz
ln -s filebeat-7.8.0-linux-x86_64 filebeat
2.配置文件
文档地址:https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html
cp filebeat.yml filebeat-backup.yml
vim filebeat.yml
filebeat.inputs:
- type: log
enabled: true
backoff: "1s"
tail_files: false
paths:
- /usr/local/nginx/logs/access.log
#输出到控制台
output.console:
enabled: true
- backoff:设定多长时间检查文件更新。
- tail_files:如果设置为true,则Filebeat将在每个文件的末尾而不是开头读取新文件。 当此选项与日志轮换结合使用时,可能会跳过新文件中的第一个日志条目。
- #输出到es
filebeat.inputs:
- type: log
enabled: true
backoff: "1s"
tail_files: false
paths:
- /usr/local/nginx/logs/access.log
#输出到es
output.elasticsearch:
hosts: ["localhost:9200"]
3.启动filebeat
#查看启动参数
./filebeat --help
#删除上一次日志加载地址
cd data
rm -rf *
#列出日志数量
cd /usr/local/nginx/logs
cat access.log|wc -l
#前台启动
./filebeat -e -c filebeat.yml
#后台启动
vi startup.sh
#! /bin/bash
nohup /usr/local/filebeat/filebeat -e -c filebeat.yml >> /usr/local/filebeat/output.log 2>&1 &
chmod a+x startup.sh
4.filebeat+logstash采集日志
- logstash 和filebeat都具有日志收集功能,filebeat更轻量,占 用资源更少,但logstash 具有filter功能,能过滤分析日志。一 般结构都是filebeat采集日志,然后发送到消息队列,redis, kafaka。然后logstash去获取,利用filter功能过滤分析,然后 存储到elasticsearch中。
- 架构图
这种架构解决了 Logstash 在各服务器节点上占用系统资源高的问题。相比 Logstash,Beats 所占系统的 CPU 和内存几乎可以忽略不计。 - 配置文件
#filebeat配置,filebeat.yml
filebeat.inputs:
- type: log
enabled: true
backoff: "1s"
tail_files: false
paths:
- /usr/local/nginx/logs/access.log
#输出到logstash
output.logstash:
enabled: true
hosts: ["localhost:5044"]
#logstash配置,logstash.conf
#使用logstash-input-beats插件
#会开启5044端口
input {
beats {
host => "0.0.0.0"
port => 5044
}
}
filter {
grok {
match => { "message" => "%{HTTPD_COMBINEDLOG}" }
}
date {
match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "nginx-%{+YYYY.MM.dd}"
}
}
#logstash移除不必要的字段
input {
beats {
host => "0.0.0.0"
port => 5044
}
}
filter {
grok {
match => { "message" => "%{HTTPD_COMBINEDLOG}" }
}
date {
match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
mutate{
remove_field => ["agent"]
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "nginx-%{+YYYY.MM.dd}"
}
}
5.filebeat采集json格式的日志数据
- 修改nginx的日志为json格式
#nginx访问日志
192.168.230.110 - - [29/Aug/2020:12:50:21 +0800]
"GET /abc/abc2.txt HTTP/1.1" 404 555 "-"
"Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/73.0.3683.86 Safari/537.36"
#修改nginx.conf
log_format log_json '{"remote_addr":"$remote_addr", '
'"ident": "-", '
'"user": "$remote_user", '
'"timestamp": "$time_local",'
'"request": "$request", '
'"status": $status, '
'"bytes": $body_bytes_sent, '
'"referer": "$http_referer",'
'"agent": "$http_user_agent",'
'"x_forwarded":"$http_x_forwarded_for"'
' }';
access_log logs/access-json.log log_json;
#检查配置格式是否正确
sbin/nginx -t -c conf/nginx.conf
输出:
nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful
#开启nginx多线程
worker_processes 4;
#重新加载配置文件
sbin/nginx -s reload
- filebeat配置文件
#filebeat配置,filebeat.yml
filebeat.inputs:
- type: log
enabled: true
backoff: "1s"
tail_files: false
paths:
- /usr/local/nginx/logs/access-json.log
#输出到logstash
output.logstash:
enabled: true
hosts: ["localhost:5044"]
- logstash配置
input {
beats {
host => "0.0.0.0"
port => 5044
}
}
filter {
json {
source => "message"
remove_field => ["agent"]
}
date {
match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "nginx-%{+YYYY.MM.dd}"
}
}
6.Filebeat同时采集多个日志
- filebeat配置文件
#filebeat.yml
filebeat.inputs:
- type: log
enabled: true
backoff: "1s"
tail_files: false
paths:
- /usr/local/nginx/logs/access-json.log
fields:
filetype: logjson
fields_under_root: true
- type: log
enabled: true
backoff: "1s"
tail_files: false
paths:
- /var/log/messages
fields:
filetype: logsystem
fields_under_root: true
output.logstash:
enabled: true
hosts: ["localhost:5044"]
fields:自定义字段
fields_under_root:为true,则自定义字段将为文档中的顶级字段。
- logstash配置
#logstash.conf
input {
beats {
host => "0.0.0.0"
port => 5044
}
}
filter {
if [filetype] == "logjson" {
json {
source => "message"
remove_field => ["agent","beat","offset","tags","prospector"]
}
date {
match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
}
}
output {
if [filetype] == "logjson" {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "nginx-%{+YYYY.MM.dd}"
}
} else if [filetype] == "logsystem" {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "msg-%{+YYYY.MM.dd}"
}
}
}
注意:filter过滤的字段只能越来越少,不能增加
否则报:[ElasticSearch MapperParsingException object mapping](https://stackoverflow.com/questions/23605942/elasticsearch-mapperparsingexception-object-mapping)
错误
7.Filebeat+Redis+Logstash收集日志数据
- 当logstash宕机的时候,这时候filebeat就不能往logstash里写数据了,这期间的日志信息可能就无法采集到,因此一般都会采用redis或kafka作为一个消息缓冲层。logstash去消费数据写至es。
- 安装Redis
tar zxvf redis-5.0.11.tar.gz
make
make install
#初始化redis:
./utils/install_server.sh
Please select the redis port for this instance: [6379]
Selecting default: 6379
Please select the redis config file name [/etc/redis/6379.conf]
Selected default - /etc/redis/6379.conf
Please select the redis log file name [/var/log/redis_6379.log]
Selected default - /var/log/redis_6379.log
Please select the data directory for this instance [/var/lib/redis/6379]
Selected default - /var/lib/redis/6379
Please select the redis executable path [/usr/local/bin/redis-server]
Selected config:
Port : 6379
Config file : /etc/redis/6379.conf
Log file : /var/log/redis_6379.log
Data dir : /var/lib/redis/6379
Executable : /usr/local/bin/redis-server
Cli Executable : /usr/local/bin/redis-cli
Is this ok? Then press ENTER to go on or Ctrl-C to abort.
Copied /tmp/6379.conf => /etc/init.d/redis_6379
Installing service...
Successfully added to chkconfig!
Successfully added to runlevels 345!
Starting Redis server...
Installation successful!
#检查配置
chkconfig --list
#检查redis-cli
[root@elk utils]# which redis-cli
/usr/local/bin/redis-cli
#修改配置:
vi /etc/redis/6379.conf
bind 0.0.0.0
port 6379
daemonize yes
logfile /var/log/redis_6379.log
dir /var/lib/redis/6379
#重启redis
kill -9 16101
rm /var/run/redis_6379.pid
service redis_6379 start
systemctl enable redis_6379(chkconfig redis_6379 on
)
#进入redis
redis-cli
- filebeat配置文件
#filebeat.yml
filebeat.inputs:
- type: log
enabled: true
backoff: "1s"
tail_files: false
paths:
- /usr/local/nginx/logs/access-json.log
fields:
filetype: nginxjson
fields_under_root: true
#输出到redis
output.redis:
enabled: true
hosts: ["127.0.0.1:6379"]
key: nginx
db: 0
datatype: list
- logstash配置
input {
redis {
host => "127.0.0.1"
port => 6379
key => "nginx"
data_type => "list"
db => 0
}
}
filter {
json {
source => "message"
remove_field => ["agent","beat","offset","tags","prospector"]
}
date {
match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "nginx-%{+YYYY.MM.dd}"
}
}
8.Filebeat+Kafka+Logstash 收集日志数据
- 安装kafka
tar zxvf kafka_2.13-2.7.0.tgz
mv kafka_2.13-2.7.0 /usr/local
ln -s kafka_2.13-2.7.0 kafka
- 启动zookeeper
Kafka使用ZooKeeper,所以需要先启动一个ZooKeeper服务器。
bin/zookeeper-server-start.sh config/zookeeper.properties
后台启动
vi start-zk.sh
#! /bin/bash
nohup /usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties >> /usr/local/kafka/zk-output.log 2>&1 &
chmod a+x start-zk.sh
- Kafka基本配置
vim config/server.properties
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://192.168.122.150:9092
- 启动Kafka
bin/kafka-server-start.sh config/server.properties
后台启动
bin/kafka-server-start.sh -daemon config/server.properties
- 创建Topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic fx-topic
- 查看topic列表
bin/kafka-topics.sh --list --zookeeper localhost:2181
- 启动生产者
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic fx-topic
- 启动消费者
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic fx-topic --from-beginning
- filebeat配置
https://www.elastic.co/guide/en/beats/filebeat/current/kafka-output.html
#filebeat.yml
filebeat.inputs:
- type: log
enabled: true
backoff: "1s"
tail_files: false
paths:
- /usr/local/nginx/logs/access-json.log
fields:
filetype: nginxjson
fields_under_root: true
#输出到kafka
output.kafka:
hosts: ["localhost:9092"]
topic: fx-topic
required_acks: 1
- logstash配置
input {
kafka {
bootstrap_servers => "127.0.0.1:9092"
topics => "fx-topic"
group_id => "logstash"
}
}
filter {
json {
source => "message"
remove_field => ["agent","beat","offset","tags","prospector"]
}
date {
match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "nginx-%{+YYYY.MM.dd}"
}
}
group_id
消费者分组,可以通过组 ID 去指定,不同的组之间消费是相互不受影响的,相互隔离。