程序版本说明

因为程序版本之间相互有制约,所以我去官网扣了最新的四个,但是这个下载速度真的是超慢,如果有下载不下来的可以私信我拿baiduyun的地址

软件

版本

下载地址

filebeat

7.6.1

filebeat-7.6.1-linux-x86_64.tar.gz下载

elasticsearch

7.6.1

elasticsearch-7.6.1-linux-x86_64.tar.gz下载

logstash

7.6.1

logstash-7.6.1.tar.gz下载

kibana

7.6.1

kibana-7.6.1-linux-x86_64.tar.gz下载

zookeeper

3.4.14

zookeeper下载地址

kafka

2.4.0

kafka下载地址

架构图说明

考虑到 Logstash 占用系统资源较多,我们采用 Filebeat 来作为日志采集器。这里采用kafka作为传输方式是为了避免堵塞和丢失,以实现日志的实时更新。

正式环境大概可以以这个架构图实现,本文都是单机部署

elk如何收集多个Java服务的日志 elk收集nginx日志_json

nginx日志格式修改

  1. Nginx支持自定义输出日志格式,在设置Nginx日志格式前,需要了解一下nginx内置的日志变量。
log_format  main escape=json '{"accessip_list":"$proxy_add_x_forwarded_for",'  
                                  '"remote_addr":"$remote_addr",'
                                  '"http_host":"$http_host",'
                                  '"@timestamp":"$time_iso8601",'
                                  '"referer":"$http_referer",'
                                  '"scheme":"$scheme",'
                                  '"request":"$request",'
								  '"request_header":"$request_header",'
                                  '"request_method":"$request_method",'
                                  '"request_time":"$request_time",'
                                  '"request_body":"$request_body",'
                                  '"server_protocol":"$server_protocol",'
                                  '"uri":"$uri",'
                                  '"http_host":"$host",'
                                  '"domain":"$server_name",'
                                  '"hostname":"$hostname",'
                                  '"status":$status,'
                                  '"bytes":$body_bytes_sent,'
                                  '"agent":"$http_user_agent",'
                                  '"x_forwarded":"$http_x_forwarded_for",'
								  '"response_body":"$response_body",'
                                  '"upstr_addr":"$upstream_addr",'
                                  '"upstr_host":"$upstream_http_host",'
                                  '"ups_resp_time":"$upstream_response_time" }';
  1. 因为我这里后续需要用到请求头、请求体、返回体分析业务,所以我还用了lua脚本来获取请求头及返回体,如果不需要的可以去掉上面response_body、request_header、request_body。
  2. 如果你没有注释掉response_body和request_header的话还需要增加lua脚本,如下为完整示例
user  root;
worker_processes  4;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;

worker_rlimit_nofile 2048;

events {
    use epoll;
    worker_connections  2048;
}

include /opt/verynginx/verynginx/nginx_conf/in_external.conf;

http {
    include       mime.types;

    default_type  application/json; 
    client_max_body_size   200m;
    client_header_buffer_size 50m;

    large_client_header_buffers 4 50m;

log_format  main escape=json '{"accessip_list":"$proxy_add_x_forwarded_for",'
                                  '"remote_addr":"$remote_addr",'
                                  '"http_host":"$http_host",'
                                  '"@timestamp":"$time_iso8601",'
                                  '"referer":"$http_referer",'
                                  '"scheme":"$scheme",'
                                  '"request":"$request",'
								  '"request_header":"$request_header",'
                                  '"request_method":"$request_method",'
                                  '"request_time":"$request_time",'
                                  '"request_body":"$request_body",'
                                  '"server_protocol":"$server_protocol",'
                                  '"uri":"$uri",'
                                  '"http_host":"$host",'
                                  '"domain":"$server_name",'
                                  '"hostname":"$hostname",'
                                  '"status":$status,'
                                  '"bytes":$body_bytes_sent,'
                                  '"agent":"$http_user_agent",'
                                  '"x_forwarded":"$http_x_forwarded_for",'
								  '"response_body":"$response_body",'
                                  '"upstr_addr":"$upstream_addr",'
                                  '"upstr_host":"$upstream_http_host",'
                                  '"ups_resp_time":"$upstream_response_time" }';

    access_log  logs/access.log  main;
    sendfile        on;
    tcp_nopush     on;
    tcp_nodelay on;

    lua_shared_dict shared_data 5m;  
    lua_shared_dict shared_server 5m;  
    lua_shared_dict locks 30m;  
    lua_shared_dict limit_counter 30m;  

    server {
        listen      8999;
        server_name  10.99.6.3;
        #记录接口请求头
    	set $request_header "";
    	header_filter_by_lua '
    	local request_header_tab = ngx.req.get_headers()
    	local json = require "cjson"
    	ngx.var.request_header = json.encode(request_header_tab)
    	';
    
   	 	#记录接口请求返回值
    	lua_need_request_body on;
    	set $response_body "";
    	body_filter_by_lua '
    	local response_body = string.sub(ngx.arg[1], 1, 1000)
    	ngx.ctx.buffered = (ngx.ctx.buffered or "") .. response_body
    	if ngx.arg[2] then
      	ngx.var.response_body = ngx.ctx.buffered
    	end
    	';
        location / {
            proxy_pass http://localhost:1111;
        }
    }

}

Filebeat安装部署

# 下载
wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.6.1-linux-x86_64.tar.gz

# 解压至指定目录
tar -zxvf filebeat-7.6.1-linux-x86_64.tar.gz -C /usr/local/

# 修改文件夹
cd /usr/local/
mv filebeat-7.6.1-linux-x86_64 filebeat-7.6.1
cd /usr/local/filebeat-7.6.1

# 修改配置文件,配置文件附于下方
vi filebeat.yml

# 先启动看看配置文件有没有问题
./filebeat -e -c filebeat.yml
# 无误后,后台启动,建议不要开启日志输出,如果业务量大的话,文件会非常大
nohup ./filebeat -e -c filebeat.yml >&/dev/null &

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /usr/local/openresty-1.15/nginx/logs/access.log #日志目录的位置 
  fields:
    log_topic: jszt-nginxaccess-log #插入到kafka那个topic
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
setup.template.settings:
  index.number_of_shards: 1
setup.kibana:
output.kafka:
  enabled: true
  hosts: ["10.99.6.4:9092"] # kafka的地址,多台可以用逗号隔开
  topic: '%{[fields.log_topic]}'
  partition.round_robin:
    reachable_only: true
  required_acks: 1
  compression: gzip
  max_message_bytes: 1000000
logging.level: debug
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

Elasticsearch安装

# 下载
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-linux-x86_64.tar.gz

# 解压至指定目录
tar -zvxf elasticsearch-7.6.1.tar.gz -C /usr/local/

# 进入cofig目录
cd /usr/local/elasticsearch-7.6.1/config

# 修改配置文件,配置文件附于下方
vi elasticsearch.yml

# 修改jvm.options中jvm参数为你实际内存的一半大小,
vi jvm.options
-Xms512m 
-Xmx512m

# 新增用户 es,用于启动
useradd es

# 修改es密码,会让你输入两次密码
passwd es

# 给es用户授权目录权限
chown -R es:es /usr/local/elasticsearch-7.6.1

# 切换用户启动
su - es
./bin/elasticsearch -d

# es启动常见报错解决说明
1.max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
这个报错说明每个进程最大同时打开文件数太小,需修改/etc/security/limits.conf
vi /etc/security/limits.conf  #增加如下配置,这样不影响其他用户
es         soft    nproc        65536
es         hard    nproc        65536
es         soft    nofile       65536
es         hard    nofile       65536
修改完毕后保存,重新登录就可以生效了。

2.max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
这个报错说明 vm 的启动内存太小,需修改/etc/sysctl.conf
vi /etc/sysctl.conf #在本文件最后增加如下配置
vm.max_map_count = 262144l
修改完毕后,使用下面命令生效
sysctl -p

elasticsearch.yml

cluster.name: elasticsearch # 集群名称
node.name: node-1 # 节点名称
node.max_local_storage_nodes: 2
network.host: 0.0.0.0
http.port: 9200  # 端口号
http.cors.enabled: true
http.cors.allow-origin: "*"  #配置跨域
cluster.initial_master_nodes: ["node-1"]

Logstash安装

# 安装
wget https://artifacts.elastic.co/downloads/logstash/logstash-7.6.1.tar.gz

# 解压至指定目录
tar -zxvf logstash-7.6.1.tar.gz -C /usr/local/

# 进入配置目录
cd /usr/local/logstash-7.6.1/config

# 增加配置文件logstash.conf,文件附于下方

# 启动
cd /usr/local/logstash-7.6.1/
nohup ./bin/logstash -f config/logstash.conf > mylogstash.log &

logstash.conf

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
  kafka {
    bootstrap_servers => "10.99.6.4:9092" # kafka地址
	topics => ["jszt-nginxaccess-log"]  # 此处为fielbeat插入的topic
	codec => json {
           charset => "UTF-8"
    }
	add_field => { "[@metadata][myid]" => "jsztnginxaccess-log" }
  }
}

filter {
    if [@metadata][myid] == "jsztnginxaccess-log" {
         # 此处可根据需要配置,这里我转义了一些url字符
        mutate {    
            gsub => ["message","\\x","\\\x"]
			gsub => ["message","%7B","{"]
			gsub => ["message","%7D","}"]
			gsub => ["message","%22",'"']
			gsub => ["message","%3A",":"]
			gsub => ["message","%2C",","]
			gsub => ["message",'\n','']
			gsub => ["message",'\r','']
			
        }
		json {
            source => "message"
            remove_field => "prospector"
            remove_field => "beat"
            remove_field => "source"
            remove_field => "input"
            remove_field => "offset"
            remove_field => "fields"
            remove_field => "host"
            remove_field => "@version"
            remove_field => "message"
        }
		# 这里我移除了请求头里我不需要的参数
		json {
            source => "request_header"
            remove_field => "host"
            remove_field => "content-type"
            remove_field => "connection"
			remove_field => "accept"
            remove_field => "content-length"
            remove_field => "accept-encoding"
            remove_field => "user-agent"
            remove_field => "accept-language"
            remove_field => "referer"
            remove_field => "x-requested-with"
			remove_field => "origin"
			remove_field => "cookie"
        }
		# 这里我根据不同的url解析不同的body体
		if [uri] == "/app-web/personal/public/gjjdkjbxxcx.service" {
			json {
				source => "request_body"
				remove_field => "appid"
				remove_field => "sign"
				remove_field => "bizNo"
			}
		}
		
        mutate {
            add_field => { "nginx_ip" => "%{remote_addr}" }
            convert => ["ups_resp_time","float"]
            remove_field => [ "host" ]
        }
      # 解析用户设备
		if [agent] != "-" and [agent] != '' {
			useragent {
				source => "agent"
				target => "user_agent"
			}
		}
      # 解析用户ip获取坐标、城市等信息
        geoip {
			default_database_type => "City"
            source => "remote_addr"
            target => "geoip"
            add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
            add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]

        }
        mutate {
            convert => [ "[geoip][coordinates]", "float"]
        }
    }
}

output {
   if [@metadata][myid] == "jsztnginxaccess-log" {
   # 输出到es
        elasticsearch {
            hosts => ["127.0.0.1:9200"]
            index => "logstash-jszt-nginxlogs-%{+YYYY.MM.dd}"
        }
    }
}

Kibana安装

# 安装
wget https://artifacts.elastic.co/downloads/kibana/kibana-7.6.1-linux-x86_64.tar.gz

# 解压
tar -zxvf kibana-7.6.1-linux-x86_64.tar.gz -C /usr/local/

# 修改配置
mv /usr/local/kibana-7.6.1-linux-x86_64/ /usr/local/kibana-7.6.1
vim config/kibana.yml  #配置文件附于下方

# 启动
nohup ./bin/kibana --allow-root >kibana.log &

kibana.yml

server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
i18n.locale: "zh-CN"

后记

这里最后我是用的grafana展示的

elk如何收集多个Java服务的日志 elk收集nginx日志_nginx_02



elk如何收集多个Java服务的日志 elk收集nginx日志_json_03


由于后续我们要分析业务的,我们可以自己分析保留数据,比如我们可以通过logstash直接output到hadoop,然后…