• 以轻量化方式收集、解析和传输数据。
  • Beats 平台集合了多种单一用途数据采集器。
  • 它们从成百上千或成千上万台机器和系统向 Logstash 或 Elasticsearch 发送数据。

1.安装部署

tar zxvf filebeat-7.8.0-linux-x86_64.tar.gz
ln -s filebeat-7.8.0-linux-x86_64 filebeat

2.配置文件

文档地址:https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html

cp filebeat.yml filebeat-backup.yml

vim filebeat.yml

filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access.log

#输出到控制台  
output.console:
  enabled: true
  • backoff:设定多长时间检查文件更新。
  • tail_files:如果设置为true,则Filebeat将在每个文件的末尾而不是开头读取新文件。 当此选项与日志轮换结合使用时,可能会跳过新文件中的第一个日志条目。
  • #输出到es
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access.log
#输出到es
output.elasticsearch:
  hosts: ["localhost:9200"]

3.启动filebeat

#查看启动参数
./filebeat --help
#删除上一次日志加载地址
cd data
rm -rf *
#列出日志数量
cd /usr/local/nginx/logs
cat access.log|wc -l
#前台启动
./filebeat -e -c filebeat.yml
#后台启动
vi startup.sh
#! /bin/bash
nohup /usr/local/filebeat/filebeat -e -c filebeat.yml >> /usr/local/filebeat/output.log 2>&1 &
chmod a+x startup.sh

4.filebeat+logstash采集日志

  • logstash 和filebeat都具有日志收集功能,filebeat更轻量,占 用资源更少,但logstash 具有filter功能,能过滤分析日志。一 般结构都是filebeat采集日志,然后发送到消息队列,redis, kafaka。然后logstash去获取,利用filter功能过滤分析,然后 存储到elasticsearch中。
  • 架构图

    这种架构解决了 Logstash 在各服务器节点上占用系统资源高的问题。相比 Logstash,Beats 所占系统的 CPU 和内存几乎可以忽略不计。
  • 配置文件
#filebeat配置,filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access.log
#输出到logstash
output.logstash:
  enabled: true
  hosts: ["localhost:5044"]

#logstash配置,logstash.conf
#使用logstash-input-beats插件
#会开启5044端口
input {
    beats {
      host => "0.0.0.0"
      port => 5044
    }
}

filter {
    grok {  
      match => { "message" => "%{HTTPD_COMBINEDLOG}" }
    }
    date {
      match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
}

output {
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
}

#logstash移除不必要的字段
input {
    beats {
      host => "0.0.0.0"
      port => 5044
    }
}

filter {
    grok {  
      match => { "message" => "%{HTTPD_COMBINEDLOG}" }
    }
    date {
      match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
    mutate{
       remove_field => ["agent"]
    }
}

output {
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
}

5.filebeat采集json格式的日志数据

  • 修改nginx的日志为json格式
#nginx访问日志
192.168.230.110 - - [29/Aug/2020:12:50:21 +0800]
"GET /abc/abc2.txt HTTP/1.1" 404 555 "-"
"Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/73.0.3683.86 Safari/537.36"

#修改nginx.conf
log_format log_json '{"remote_addr":"$remote_addr", '
                    '"ident": "-", '
                    '"user": "$remote_user", '
                    '"timestamp": "$time_local",'
                    '"request": "$request", '
                    '"status": $status, '
                    '"bytes": $body_bytes_sent, '
                    '"referer": "$http_referer",'
                    '"agent": "$http_user_agent",'
                    '"x_forwarded":"$http_x_forwarded_for"'
                    ' }';
access_log logs/access-json.log log_json;

#检查配置格式是否正确
sbin/nginx -t -c conf/nginx.conf

输出: 
nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful

#开启nginx多线程
worker_processes  4;

#重新加载配置文件
sbin/nginx -s reload
  • filebeat配置文件
#filebeat配置,filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access-json.log
#输出到logstash
output.logstash:
  enabled: true
  hosts: ["localhost:5044"]
  • logstash配置
input {
    beats {
      host => "0.0.0.0"
      port => 5044
    }
}

filter {
    json {
       source => "message"
       remove_field => ["agent"]
    }
    date {
      match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
}

output {
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
}

6.Filebeat同时采集多个日志

  • filebeat配置文件
#filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access-json.log
    fields:
      filetype: logjson
    fields_under_root: true
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /var/log/messages
    fields:
      filetype: logsystem
    fields_under_root: true
output.logstash:
  enabled: true
  hosts: ["localhost:5044"]

fields:自定义字段

fields_under_root:为true,则自定义字段将为文档中的顶级字段。

  • logstash配置
#logstash.conf
input {
    beats {
      host => "0.0.0.0"
      port => 5044
    }
}
filter {
    if [filetype] == "logjson" {
        json {
          source => "message"
          remove_field => ["agent","beat","offset","tags","prospector"]
        }
        date {
          match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
          target => "@timestamp"
        }
    }
}
output {
    if [filetype] == "logjson" {
        elasticsearch {
            hosts => ["127.0.0.1:9200"]
            index => "nginx-%{+YYYY.MM.dd}"
        }
    } else if [filetype] == "logsystem" {
        elasticsearch {
          hosts => ["127.0.0.1:9200"]
          index => "msg-%{+YYYY.MM.dd}"
        }
    }
}

注意:filter过滤的字段只能越来越少,不能增加

否则报:[ElasticSearch MapperParsingException object mapping](https://stackoverflow.com/questions/23605942/elasticsearch-mapperparsingexception-object-mapping)错误

7.Filebeat+Redis+Logstash收集日志数据

filebeat采集 java日志 filebeat日志收集_filebeat采集 java日志

  • 当logstash宕机的时候,这时候filebeat就不能往logstash里写数据了,这期间的日志信息可能就无法采集到,因此一般都会采用redis或kafka作为一个消息缓冲层。logstash去消费数据写至es。
  • 安装Redis
tar zxvf redis-5.0.11.tar.gz
make
make install

#初始化redis:
./utils/install_server.sh

Please select the redis port for this instance: [6379] 
Selecting default: 6379
Please select the redis config file name [/etc/redis/6379.conf] 
Selected default - /etc/redis/6379.conf
Please select the redis log file name [/var/log/redis_6379.log] 
Selected default - /var/log/redis_6379.log
Please select the data directory for this instance [/var/lib/redis/6379] 
Selected default - /var/lib/redis/6379
Please select the redis executable path [/usr/local/bin/redis-server] 
Selected config:
Port           : 6379
Config file    : /etc/redis/6379.conf
Log file       : /var/log/redis_6379.log
Data dir       : /var/lib/redis/6379
Executable     : /usr/local/bin/redis-server
Cli Executable : /usr/local/bin/redis-cli
Is this ok? Then press ENTER to go on or Ctrl-C to abort.
Copied /tmp/6379.conf => /etc/init.d/redis_6379
Installing service...
Successfully added to chkconfig!
Successfully added to runlevels 345!
Starting Redis server...
Installation successful!

#检查配置
chkconfig --list

#检查redis-cli
[root@elk utils]# which redis-cli 
/usr/local/bin/redis-cli
	
#修改配置:
vi /etc/redis/6379.conf 

bind 0.0.0.0
port 6379
daemonize yes
logfile /var/log/redis_6379.log
dir /var/lib/redis/6379

#重启redis
kill -9 16101
rm /var/run/redis_6379.pid
service redis_6379 start
systemctl enable redis_6379(chkconfig redis_6379 on
)

#进入redis
redis-cli
  • filebeat配置文件
#filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access-json.log
    fields:
      filetype: nginxjson
    fields_under_root: true
#输出到redis
output.redis:
  enabled: true
  hosts: ["127.0.0.1:6379"]
  key: nginx
  db: 0
  datatype: list
  • logstash配置
input {
    redis {
      host => "127.0.0.1"
      port => 6379
      key => "nginx"
      data_type => "list"
      db => 0
    }
}

filter {
    json {
       source => "message"
       remove_field => ["agent","beat","offset","tags","prospector"]
    }
    date {
      match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
}

output {
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
}

8.Filebeat+Kafka+Logstash 收集日志数据

filebeat采集 java日志 filebeat日志收集_nginx_02

  • 安装kafka
tar zxvf kafka_2.13-2.7.0.tgz
mv kafka_2.13-2.7.0 /usr/local
ln -s kafka_2.13-2.7.0 kafka
  • 启动zookeeper
    Kafka使用ZooKeeper,所以需要先启动一个ZooKeeper服务器。
bin/zookeeper-server-start.sh config/zookeeper.properties

后台启动

vi start-zk.sh
#! /bin/bash
nohup /usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties >> /usr/local/kafka/zk-output.log 2>&1 &
chmod a+x start-zk.sh
  • Kafka基本配置
vim config/server.properties
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://192.168.122.150:9092
  • 启动Kafka
bin/kafka-server-start.sh config/server.properties

后台启动

bin/kafka-server-start.sh -daemon config/server.properties
  • 创建Topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic fx-topic
  • 查看topic列表
bin/kafka-topics.sh --list --zookeeper localhost:2181
  • 启动生产者
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic fx-topic
  • 启动消费者
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic fx-topic --from-beginning
  • filebeat配置
    https://www.elastic.co/guide/en/beats/filebeat/current/kafka-output.html
#filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    backoff: "1s"
    tail_files: false
    paths:
      - /usr/local/nginx/logs/access-json.log
    fields:
      filetype: nginxjson
    fields_under_root: true
#输出到kafka
output.kafka:
  hosts: ["localhost:9092"]
  topic: fx-topic
  required_acks: 1
  • logstash配置
input {
    kafka {
      bootstrap_servers => "127.0.0.1:9092"
      topics => "fx-topic"
      group_id => "logstash"
    }
}

filter {
    json {
       source => "message"
       remove_field => ["agent","beat","offset","tags","prospector"]
    }
    date {
      match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => "@timestamp"
    }
}

output {
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
}

group_id
消费者分组,可以通过组 ID 去指定,不同的组之间消费是相互不受影响的,相互隔离。