ELK集群搭建

1. ELK是什么?

ELK是三个开源软件的缩写,分别表示:Elasticsearch , Logstash, Kibana,也可以指elk技术栈,包含一系列的组件。

Elasticsearch是一个分布式、高扩展、高实时的搜索与数据分析引擎。它能很方便的使大量数据具有搜索、分析和探索的能力。充分利用ElasticSearch的水平伸缩性,能使数据在生产环境变得更有价值。ElasticSearch 的实现原理主要分为以下几个步骤,首先用户将数据提交到Elastic Search 数据库中,再通过分词控制器去将对应的语句分词,将其权重和分词结果一并存入数据,当用户搜索数据时候,再根据权重将结果排名,打分,再将返回结果呈现给用户。它的特点有:分布式,零配置,自动发现,索引自动分片,索引副本机制,restful风格接口,多数据源,自动搜索负载等。

Logstash是开源的服务器端数据处理管道,能够同时从多个来源采集数据,转换数据,然后将数据发送到您最喜欢的“存储库”中。一般用在日志的搜集、分析、过滤,支持大量的数据获取方式。

Kibana 可以对 Elasticsearch 进行可视化,还可以在 Elastic Stack 中进行导航,这样便可以进行各种操作了,从跟踪查询负载,到理解请求如何流经您的整个应用,都能轻松完成。权限管理依赖收费授权的x-pack组件,若无权限管理,则整个es数据内容对所有用户可见可操作,存在安全风险,若不想购买许可证可以考虑功能强大的Grafana替代。

Filebeat隶属于Beats。目前Beats包含四种工具:

  1. Packetbeat(收集网络流量数据)
  2. Topbeat(收集系统、进程和文件系统级别的 CPU 和内存使用情况等数据)
  3. Filebeat(收集文件数据)
  4. Winlogbeat(搜集 Windows 事件日志数据)
  5. heartbeat(用于系统或者应用监控)

官方文档地址https://www.elastic.co/guide/index.html官方下载地址https://www.elastic.co/cn/downloads/


2. 集群设计

本文集群基于elasticsearch 7.2.0 组件实现,并作为笔者工作所设计系统的的一个组成部分,包括了elasticsearchlogstashkibanafilebeatelasticsearch-head插件中文分词插件IK以及kafka,ELK7版本较之前版本主要配置有些变化,为避免版本不一致踩坑付出不必要学习成本,请尽量保持版本一致性,熟悉后可查询官方文档使用最新版。本文档只做集群安装配置说明,组件相关更多功能和配置后期有空会增加系列文章,有兴趣同学可以先自行查阅官方文档说明。

2.1 总体架构

系统总体数据流如下图,其中agent使用了filebeat,用来搜集处理nginx反向代理服务的日志以及WEB应用日志,数据搜集后统一发送给kafka集群,其他组件可以消费原始数据,也可以走logstash->elasticwearch进行简单的日志归集与统计分析

ELK集群配置 elk集群搭建_elk

3. Nginx

3.1 格式化nginx access日志

为方便处理数据,将相关Nginx日志格式化为json格式,减少后期转换开销,比这nginx使用的淘宝Tegine版本,可能部分字段没有,没有的字段值若未加引号,会导致logstash json过滤器处理异常,请注意。nginx日志字段及格式语法可参见官方文档http://nginx.org/en/docs/http/ngx_http_log_module.html。另外filebeat提供了nginx等众多组件的官方模块,启用后可以快速配置nginx的模块处理,本文档未使用官方模块,为自定义处理方式。(nginx.conf)

log_format main  '{"bytes_sent":$bytes_sent,'
#      '"content_length": $content_length,'
#      '"content_type": "$content_type",'
      '"http_x_forwarded_for": "$http_x_forwarded_for",'
      '"http_referer": "$http_referer",'
      '"http_user_agent": "$http_user_agent",'
#      '"document_root": "$document_root",'
      '"document_uri": "$document_uri",'
      '"host": "$host",'
#      '"hostname": "$hostname",'
      '"pid": $pid,'
      '"proxy_protocol_addr": "$proxy_protocol_addr",'
#      '"proxy_protocol_port": $proxy_protocol_port,'
#      '"realpath_root": "$realpath_root",'
      '"remote_addr": "$remote_addr",'
      '"remote_port": "$remote_port",'
      '"remote_user": "$remote_user",'
      '"request": "$request",'
      '"request_filename": "$request_filename",'
#      '"request_id": "$request_id",'
      '"request_length": $request_length,'
      '"request_method": "$request_method",'
      '"request_time": "$request_time",'
      '"request_uri": "$request_uri",'
      '"scheme": "$scheme",'
      '"sent_http_name": "$sent_http_name",'
      '"server_addr": "$server_addr",'
      '"server_name": "$server_name",'
      '"server_port": $server_port,'
      '"server_protocol": "$server_protocol",'
      '"status": "$status",'
      '"time_iso8601": "$time_iso8601",'
      '"time_local": "$time_local",'
      '"upstream_addr": "$upstream_addr",'
#      '"upstream_bytes_received": $upstream_bytes_received,'
      '"upstream_cache_status": "$upstream_cache_status",'
      '"upstream_http_name": "$upstream_http_name",'
      '"upstream_response_length": "$upstream_response_length",'
      '"upstream_response_time": "$upstream_response_time",'
      '"upstream_status": "$upstream_status"}';

4. filebeat

4.1 安装与部署

  1. 下载并解压到指定目录
  2. 创建data目录
  3. 编辑配置文件
  4. 启动filebeat
  5. 停止filebeat
#下载并解压到指定目录
tar -zxvf filebeat-7.2.0-linux-x86_64.tar.gz -C /usr/local/elk/
#创建data目录
cd /usr/local/elk/filebeat-7.2.0-linux-x86_64
mkdir data
#编辑配置文件
vim filebeat.yml
#启动filebeat
nohup filebeat -e >>/dev/null 2>&1 &
#停止filebeat
ps -ef | grep "filebeat"
kill -QUIT 进程号

4.2 配置(filebeat.yml)

以下展示出的默认文件中增加的配置,这里output输出使用的kafka,请注释掉输出到其他组件的配置

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  # 可配置多个路径
  paths:
    - /home/elk/logs/nginx/access*.log
  
  # 以下是filebeat中自定义字段,方便后期区分数据进行进一步处理  
  fields:
    ServerIp: 10.11.48.160
    ApplicationId: elk-global-nginx
    ApplicationDescribe: elk-global-nginx
    LogType: "access"
    LogLabel: "access"
    
- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /home/elk/logs/nginx/error*.log
    
  fields:
    ServerIp: 10.11.48.160
    ApplicationId: elk-global-nginx
    ApplicationDescribe: elk-global-nginx
    LogType: "error"
    LogLabel: "error"


- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /home/elk/logs/ConsoleGlobal.*
    
  fields:
    ServerIp: 10.11.48.160
    ApplicationId: elk-global-console
    ApplicationDescribe: elk-global-console
    LogType: "server"
    LogLabel: "server"

  # filebeat读取日志内容是按行读取的,一般日志都是按行打印,但是可能存在类似java异常信息多行情况就需要对多行特殊处理    
  multiline:
      pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:'
      negate: false
      match: after

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /home/elk/logs/ELKLog.*
    
  fields:
    ServerIp: 10.11.48.160
    ApplicationId: ELK-global-main
    ApplicationDescribe: ELK-global-main
    LogType: "server"
    LogLabel: "server"
    
  multiline:
      pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:'
      negate: false
      match: after

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /home/ELK/logs/test/*.log
    
  fields:
    ServerIp: 10.11.48.160
    ApplicationId: ELK-test
    ApplicationDescribe: ELK-test
    LogType: "server"
    LogLabel: "server"


#----------------------------- kafka output -----------------------------------
output.kafka:
  enabled: true
  # initial brokers for reading cluster metadata
  hosts: ["10.11.48.160:9092", "10.11.48.161:9092", "10.11.48.165:9092"]

  # message topic selection + partitioning
  topic: '%{[fields][ApplicationId]}-%{[fields][LogType]}'
  partition.round_robin:
    reachable_only: false

  compression: gzip

5. zookeeper

kafka依赖zookeeper,安装kafka前需先安装配置zookeeper集群

  1. 下载zookeeper:https://zookeeper.apache.org/releases.html
  2. 配置zookeeper(zoo.cfg)
#zk存放数据的目录,zk 需要有一个叫做myid的文件也是放到(必须)这个目录下,集群环境不得重复,myid文件中编号需与zoo.cfg中一直,zookeeper编号从1开始

dataDir=/usr/local/elk/apache-zookeeper-3.5.5/data 

dataLogDir=/usr/local/elk/apache-zookeeper-3.5.5/logs

clientPort=2181

#最大客户端连接数

maxClientCnxns=20

#是作为Zookeeper服务器之间或客户端与服务器之间维持心跳的时间间隔

tickTime=2000

#此配置表示,允许follower(相对于Leaderer言的“客户端”)连接并同步到Leader的初始化连接时间,以tickTime为单位。当初始化连接时间超过该值,则表示连接失败。

initLimit=10

#此配置项表示Leader与Follower之间发送消息时,请求和应答时间长度。如果follower在设置时间内不能与leader通信,那么此follower将会被丢弃。

syncLimit=5

#server.myid=ip:followers_connect to the leader:leader_election # server 是固定的,myid 是需要手动分配,第一个端口是follower是链接到leader的端口,第二个是用来选举leader 用的port

server.1=10.11.48.160:2888:3888

server.2=10.11.48.161:2888:3888

server.3=10.11.48.165:2888:3888
  1. 配置环境变量(~/.bash_profile)
ZOOKEEPER_HOME=/usr/local/elk/apache-zookeeper-3.5.5
PATH=$HOME/.local/bin:$HOME/bin:$ZOOKEEPER_HOME/bin:$PATH
export  ZOOKEEPER_HOME PATH

操作步骤

tar -zxvf apache-zookeeper-3.5.5.tar.gz -C /usr/local/elk/
mkdir data
mkdir logs
cd data
#创建myid文件变设置编号,集群中myid不得重复,且不得为空,为空会导致异常。
touch myid
vim conf/zoo.cfg
./zkServer.sh start
./zkServer.sh status
./zkServer.sh stop

6. kafka

  1. 下载kafkahttps://kafka.apache.org/downloads
  2. 配置kafka(server.properties)
    broker.id集群不得重复
#每个server需要单独配置broker id,如果不配置系统会自动配置。
##集群其他服务器需要改动
broker.id=0 

#消费者的访问端口,logstash或者elasticsearch
##集群其他服务器需要改动
listeners=PLAINTEXT://10.11.48.160:9092 

#接收和发送网络信息的线程数

num.network.threads=3

#服务器用于处理请求的线程数,其中可能包括磁盘I/O。

num.io.threads=8

#套接字服务器使用的发送缓冲区(SO_SNDBUF)

socket.send.buffer.bytes=102400

#套接字服务器使用的接收缓冲区(SO_RCVBUF)

socket.receive.buffer.bytes=102400

#套接字服务器将接受的请求的最大大小(防止OOM)。

socket.request.max.bytes=104857600

#以逗号分隔的目录列表,其中存储日志文件。
#此目录要先创建,不会自动创建。(如果不行就关闭此选项)
log.dirs=/usr/local/elk/kafka_2.12-2.3.0/logs 

num.partitions=1

num.recovery.threads.per.data.dir=1

offsets.topic.replication.factor=1

transaction.state.log.replication.factor=1

transaction.state.log.min.isr=1

#设置zookeeper的连接端口 消费的时候要以这个端口消费
zookeeper.connect=10.11.48.160:2181,10.11.48.161:2181,10.11.48.165:2181

zookeeper.connection.timeout.ms=6000

group.initial.rebalance.delay.ms=0

启停及常用命令

#启动
kafka-server-start.sh -daemon ../config/server.properties
#nohup /usr/local/elk/kafka/bin/kafka-server-start.sh /usr/local /kafka/config/server.properties >>/dev/null 2>&1 &

#停止
jps
kill -9 进程号

#查看具体topic信息
kafka-topics.sh --zookeeper 127.0.0.1:2181 --describe --topic elk-main-server

#列出所有topic
kafka-topics.sh --zookeeper 127.0.0.1:2181 --list

#查看消费者分组
kafka-consumer-groups.sh --bootstrap-server 10.11.48.160:9092 --list

#查看消费进度
kafka-consumer-groups.sh --bootstrap-server 10.11.48.160:9092 --describe --group elk-global-main-server

#创建topic
kafka-topics.sh --create --topic elk-global-main-server --zookeeper 10.11.48.160:2181

#删除topic,慎用,只会删除zookeeper中的元数据,消息文件须手动删除
#删除kafka中该topic相关的目录在server.properties中找到配置log.dirs,把该目录下elk-global-main-server相关的目录删掉
#删除zookeeper中该topic相关的目录
bin/kafka-topics.sh --delete --zookeeper 127.0.0.1:2181 --topic elk-global-main-server

7. logstash

  1. 下载并解压到指定目录
  2. 配置logstash(logstash.conf)
    logstash建议至少配置两台,kafka使用不同消费地址。如果只是试验搭建可以不用写过滤器以及根据不同类型的匹配。可参考官方文档简单的配置例子。
# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
    #用户系统控台日志
    kafka{
        bootstrap_servers => "10.11.48.160:9092"
        topics =>  ["elk-console-server"]
                    group_id => "elk-console-server"
                    type => "elk-console-server"
    }
    #用户系统国际版控台日志
    kafka{
        bootstrap_servers => "10.11.48.160:9092"
        topics =>  ["elk-global-console-server"]
                    group_id => "elk-global-console-server"
                    type => "elk-global-console-server"
    }
    #用户系统应用服务器日志   
    kafka{
        bootstrap_servers => "10.11.48.160:9092"
        topics =>  ["elk-main-server"]
                    group_id => "elk-main-server"
                    type => "elk-main-server"
    }
    #用户系统国际版应用服务器日志   
    kafka{
        bootstrap_servers => "10.11.48.160:9092"
        topics =>  ["elk-global-main-server"]
                    group_id => "elk-global-main-server"
                    type => "elk-global-main-server"
    }
    #国际版nginx日志
    kafka{
            bootstrap_servers => "10.11.48.160:9092"
            topics =>  ["elk-global-nginx-access"]
                        group_id => "elk-global-nginx-access"
                        type => "elk-global-nginx-access"
    } 
    #用户系统nginx日志
    kafka{
            bootstrap_servers => "10.11.48.160:9092"
            topics =>  ["elk-nginx-access"]
                        group_id => "elk-nginx-access"
                        type => "elk-nginx-access"
    } 
    #用户系统nginx日志  
    kafka{
            bootstrap_servers => "10.11.48.160:9092"
            topics =>  ["elk-nginx-error"]
                        group_id => "elk-nginx-error"
                        type => "elk-nginx-error"
    }
    #国际版nginx日志  
    kafka{
            bootstrap_servers => "10.11.48.160:9092"
            topics =>  ["elk-global-nginx-error"]
                        group_id => "elk-global-nginx-error"
                        type => "elk-global-nginx-error"
    }
    #elk测试用户系统日志   
    kafka{
            bootstrap_servers => "10.11.48.160:9092"
            topics =>  ["elk-test-server"]
                        group_id => "elk-test-server"
                        type => "elk-test-server"
    }   
}

filter{	
    #处理nginx日志
    if [type] in ["elk-global-nginx-access", "elk-nginx-access"]{
        #filebeat过来的数据,要先经过一层解析,得到的message才是真实的日志内容
        #测试kafka过来的数据,会解析两次,但无所谓,不影响结果
        json{
        	source => "message"
        	#target => "jsoncontent"
        }
        #这一层,才是解析日志内容
        json{
        	source => "message"
        	#target => "jsoncontent"
           
        }
        
        ruby {
              code => "event.set('index_day', event.get('@timestamp').time.localtime + 8*60*60)"
        }
        
        mutate{
        	#remove_field => ["_@timestamp"]
        	convert => ["index_day", "string"]
                    gsub => ["index_day", "T([\S\s]*?)Z", ""]
                    gsub => ["index_day", "-", "."]
        }
        
        #if ["time_iso8601"]
        #date {
        #	match => ["time_iso8601", "yyyy-MM-dd'T'HH:mm:ss Z"]
        #	target => "@timestamp"
        #}
        
        
        useragent{
        	source => "http_user_agent"
        	target => "useragent"
        }
        
        geoip{				
        	source => "remote_addr"
		target => "geoip"
		database => "/usr/local/elk/logstash-7.2.0/config/GeoLite2-City_20190730/GeoLite2-City.mmdb"
		add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
		add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
	}

        
        mutate{
		convert => [ "[geoip][coordinates]", "float"]
                #convert => ["bytes_sent","integer"]
        }
        mutate {
		 convert => [ "status","integer" ]
		 #convert => [ "body_bytes_sent","integer" ]
		 #convert => [ "upstream_response_time","float" ]
		 #remove_field => "message"
		}
    }
	

    if [type] in ["elk-global-nginx-error", "elk-nginx-error"]{
	grok {
		patterns_dir => ["/usr/local/elk/logstash-7.2.0/config/patterns"]
		match => { 
                       #"message" => "%{SERV_TIME_OUT:serv_timeout} %{ERROR_HOSTPORT:error_hostport} %{CHECK_HTTP_ERROR:check_http_error}" 
						"message" => "%{HOSTPORT:error_hostport}" 
               }		 
        }
	json {
		source => "message"
	}
    }
    
   if [type] in ["elk-main-server", "elk-test-server"]{
        if ([message] !~ "org.test.elk.services.servlet.SendServiceServlet") {
             drop {}
          }
       grok {
        match => [
            "message", "%{TIMESTAMP_ISO8601:reqTime} %{WORD:logLevel} .*\{\[(?<appSysId>\w+)\|(?<serial>\w+).*serviceType=(?<serviceType>\d+).*(usrsysid=(?<usrSysId>\w+)|usrSysId=(?<usrSysId>\w+))",
            "message", "%{TIMESTAMP_ISO8601:reqTime} %{WORD:logLevel} .*\{\[(?<appSysId>\w+)\|(?<serial>\w+).*serviceType=(?<serviceType>\d+).*",
            "message", "%{TIMESTAMP_ISO8601:reqTime} %{WORD:logLevel} .*\{\[(?<appSysId>\d+)\|(?<serviceType>\d+)\|(?<serial>\w+).*",
            "message", "%{TIMESTAMP_ISO8601:reqTime} %{WORD:logLevel} .*\{\[(?<appSysId>\w+)\|(?<serial>\w+).*",
            "message", "%{TIMESTAMP_ISO8601:reqTime} %{WORD:logLevel} .*"
        ]
    }
	json {
		source => "message"
	}
   }

        
}

output {
    if [type] == "elk-console-server"{
            elasticsearch {
                    hosts => ["10.11.48.160:9200", "10.11.48.160:9200", "10.11.48.160:9200" ]
                    index => "%{type}-%{+YYYY.MM.dd}"
                    #document_type => "%{type}"    
            }
    }
    if [type] == "elk-global-console-server"{
            elasticsearch {
                    hosts => ["10.11.48.160:9200", "10.11.48.160:9200", "10.11.48.160:9200" ]
                    index => "%{type}-%{+YYYY.MM.dd}"
                    #document_type => "%{type}"    
            }
    }
    if [type] == "elk-main-server"{
            elasticsearch {
                    hosts => ["10.11.48.160:9200", "10.11.48.160:9200", "10.11.48.160:9200" ]
                    index => "%{type}-%{+YYYY.MM.dd}"
                    #document_type => "%{+YYYY.MM.dd}"    
            }
    }
    if [type] == "elk-global-main-server"{
            elasticsearch {
                    hosts => ["10.11.48.160:9200", "10.11.48.160:9200", "10.11.48.160:9200" ]
                    index => "%{type}-%{+YYYY.MM.dd}"
                    #document_type => "%{+YYYY.MM.dd}"    
            }
    }
    if [type] == "elk-nginx-access"{
            elasticsearch {
                    hosts => ["10.11.48.160:9200", "10.11.48.160:9200", "10.11.48.160:9200" ]
                    index => "%{type}-%{+YYYY.MM.dd}"
                    #document_type => "%{type}"    
            }
    }
    if [type] == "elk-global-nginx-access"{
            elasticsearch {
                    hosts => ["10.11.48.160:9200", "10.11.48.160:9200", "10.11.48.160:9200" ]
                    index => "%{type}-%{+YYYY.MM.dd}"
                    #document_type => "%{type}"    
            }
    }
    if [type] == "elk-nginx-error"{
            elasticsearch {
                    hosts => ["10.11.48.160:9200", "10.11.48.160:9200", "10.11.48.160:9200" ]
                    index => "%{type}-%{+YYYY.MM.dd}"
                    #document_type => "%{type}"    
            }
    }
    if [type] == "elk-global-nginx-error"{
            elasticsearch {
                    hosts => ["10.11.48.160:9200", "10.11.48.160:9200", "10.11.48.160:9200" ]
                    index => "%{type}-%{+YYYY.MM.dd}"
                    #document_type => "%{type}"    
            }
    }
    if [type] == "elk-test-server"{
            elasticsearch {
                    hosts => ["10.11.48.160:9200", "10.11.48.160:9200", "10.11.48.160:9200" ]
                    index => "%{type}-%{+YYYY.MM.dd}"
                    #document_type => "%{type}"    
            }
    }
}

注意点

  1. 地理坐标及地址坐标展示支持中文
    地址坐标插件数据库可从maxmind官网下载免费版
  2. kibana地图展示中文支持
    可在kibana中增加一下配置项,部分其他可视化视图软件也可以支持http://webrd02.is.autonavi.com/appmaptile
map.tilemap.url: "http://webrd02.is.autonavi.com/appmaptile?lang=zh_cn&size=1&scale=1&style=7&x={x}&y={y}&z={z}"
  1. logstash提供了常用grok pattern正则,可参见https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns,groke语法预计自定义正则可参考es官方文档https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

启停及操作步骤

#解压软件包到指定目录
tar -zxvf logstash-7.2.0.tar.gz -C /usr/local/elk/

#增加自定义pattern
cd conf/
mkdir pattern

#添加geoip库

#编辑配置文件
vim logstash.conf

# 启动
nohup ./logstash -f /usr/local/elk/logstash-7.2.0/config/logstash.conf &

#停止
jps
kill-QUIT 进程号

8. elasticsearch

  1. 下载并解压到指定目录
  2. 配置
    为了方便部署与替换,zookeeper、kafka以及elasticsearch集群最好配置host,配置中以host代替ip
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
# 配置的集群名称,默认是elasticsearch,es服务会通过广播方式自动连接在同一网段下的es服务,通过多播方式进行通信,同一网段下可以有多个集群,通过集群名称这个属性来区分不同的集群。
cluster.name: elk-cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
# 当前配置所在机器的节点
node.name: elk-node-160
#
# Add custom attributes to the node:
#
# 指定该节点是否有资格被选举成为node(注意这里只是设置成有资格, 不代表该node一定就是master),默认是true,es是默认集群中的第一台机器为master,如果这台机挂了就会重新选举master。
#node.attr.rack: r1
node.master: true
#指定该节点是否存储索引数据,默认为true。
node.data: true
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
# 设置索引数据的存储路径,默认是es根目录下的data文件夹,可以设置多个存储路径,用逗号隔开
path.data: /usr/local/elk/data
#
# Path to log files:
#
path.logs: /usr/local/elk/elasticsearch-7.2.0/logs
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 10.11.48.160
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["10.11.48.160:9300", "10.11.48.161:9300", "10.11.48.165:9300"]
discovery.seed_hosts:
   - 10.11.48.160
   - 10.11.48.161
   - 10.11.48.165
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["10.11.48.160", "10.11.48.161", "10.11.48.165"]
cluster.initial_master_nodes: 
   - elk-node-160
   - elk-node-161
   - elk-node-165
#
# For more information, consult the discovery and cluster formation module documentation.

中文分词插件安装

cd /usr/local/elk/elasticsearch-7.2.0/plugins/ && mkdir ik
unzip elasticsearch-analysis-ik-7.2.0.zip -d /usr/local/elk/elasticsearch-7.2.0/plugins/ik

启停及安装步骤

# 查看文件句柄限制
ulimit –Hn

#修改系统限制
vim /etc/security/limits.conf
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096


vim /etc/sysctl.conf
vm.max_map_count=655360

sysctl -p


# 启动
nohup ./elasticsearch –d &

# 停止
jps
kill-9 进程号

#查看启动状态
jps
curl -u elastic:changeme http://10.11.48.160:9200


# 查看shard分片信息
curl -XGET http://10.11.48.160:9200/_cat/shards

注意事项以及常见问题解决方法

  1. 同一集群cluster.name需相同,使用不同node.name。
  2. 使用自动竞选方式产生主节点的话,至少需要三个节点,因此至少有三个节点设置node.master: true,集群机器数量较多,数据量较大情况下,master节点最好不要设置为数据节点,比这资源有限,所有主节点均是数据节点。
  3. 集群发现配置项在es7以后使用discovery.seed_hosts,与之前不同,若使用老的配置项,可能会导致无法发现其他节点,尽量使用当前版本示例建议的配置。
discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3[portX-portY]"]
  1. elasticsearch不能以root权限来运行,会出现这种错误:Exception in thread “main” java.lang.RuntimeException: don’t run elasticsearch as root。
  2. 提示:Max number of threads for elasticsearch too low
修改/etc/security/limits.conf
vim /etc/security/limits.conf
添加一行:xxx - nproc 2048
其中"xxx"为启动elasticsearch的用户
  1. 提示:max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
修改/etc/sysctl.conf
vim /etc/sysctl.conf
添加一行:
vm.max_map_count=262144,
添加完了执行:sysctl -p,看结果是不是vm.max_map_count = 262144
  1. 提示:max number of threads [1024] for user [lish] likely too low, increase to at least [2048]
进入limits.d目录下修改配置文件
修改如下内容:
* soft nproc 2048
  1. 提示:max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]
修改配置sysctl.conf
添加下面配置:
vm.max_map_count=655360
并执行命令:sysctl -p
重启

9. kibana

  1. 下载并解压到指定目录
  2. 配置kibana
    x-pack相关配置可自己加,本文主要展示默认配置之外的内容
# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
#server.host: "localhost"
server.host: "10.11.48.160"

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"
server.name: "kibana-160"

# The URLs of the Elasticsearch instances to use for all your queries.
#elasticsearch.hosts: ["http://localhost:9200"]
#elasticsearch.url: "http://10.11.48.160:9200"
elasticsearch.hosts: ["http://10.11.48.160:9200","http://10.11.48.161:9200", "http://10.11.48.165:9200"]

#elasticsearch.hosts: ["http://10.11.48.160:9200"]


# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
elasticsearch.username: "elastic"
elasticsearch.password: "changeme"


# Logs queries sent to Elasticsearch. Requires logging.verbose set to true.
#elasticsearch.logQueries: false

# Specifies the path where Kibana creates the process ID file.
#pid.file: /var/run/kibana.pid
pid.file: /usr/local/elk/kibana-7.2.0-linux-x86_64/logs/kibana.pid

# Enables you specify a file where Kibana stores log output.
#logging.dest: stdout
logging.dest: /usr/local/elk/kibana-7.2.0-linux-x86_64/logs/kibana.log


#地理信息中文展示
map.tilemap.url: "http://webrd02.is.autonavi.com/appmaptile?lang=zh_cn&size=1&scale=1&style=7&x={x}&y={y}&z={z}"

启停以及相关命令

# 启动
nohup ./kibana  >/dev/null 2>&1 &

#停止
ps -ef  | grep kibana
ps -ef  | grep 5601
cat logs/kibana.pid
fuser -n tcp 5601
netstat -anltp|grep 5601
kill -9 进程号

10. ELK启停脚本

启停脚本示例都是根据组件简单封装,方便后续运维操作,可根据运维需求自行修改整合,结合CI工具进行自动化部署

zookeeper

启动zookeeper(startZookeeper.sh)

# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Zookeeper starting ..."

PIDS=$(ps ax | grep java | grep -i QuorumPeerMain | grep -v grep | awk '{print $1}')
# 判断是否启动成功
if [ ! -z "$PIDS" ]; then
     echo "Zookeeper has already started !"
     echo "pid: "$PIDS
else
     # 启动
     cd /usr/local/elk/apache-zookeeper-3.5.5/bin/
     ./zkServer.sh start
     sleep 3
     PIDS=$(ps ax | grep java | grep -i QuorumPeerMain | grep -v grep | awk '{print $1}')

     # 判断是否启动成功
     if [ ! -z "$PIDS" ]; then
          echo "Zookeeper has started successfully!"
          echo "pid: "$PIDS
     else
          echo "Zookeeper failed to start!"
          exit 0
     fi
fi

echo "##################################################"

停止zookeeper(stopZookeeper.sh)

#!/bin/sh

echo "#######################################################################"
echo "#######################################################################"

echo "zookeeper stopping ..."

SIGNAL=${SIGNAL:-TERM}
PIDS=$(ps ax | grep java | grep -i QuorumPeerMain | grep -v grep | awk '{print $1}')

if [ -z "$PIDS" ]; then
  echo "No zookeeper server to stop"
  exit 1
else
  kill -s $SIGNAL $PIDS
  echo "Zookeeper has stopped!"
fi

echo "#######################################################################"

kafka

启动kafka(startKafka.sh)

# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Kafka  starting ..."

pid=$(jps | grep "Kafka" | awk '{print $1}')
# 判断是否启动成功
if [ ! -z "$pid" ]; then
     echo "Kafka has already started !"
     echo "pid: "$pid
else
     # 启动
     cd /usr/local/elk/kafka_2.12-2.3.0/bin
     nohup ./kafka-server-start.sh ../config/server.properties >/dev/null 2>&1 &
     sleep 3

     pid=$(jps | grep "Kafka" | awk '{print $1}')
     # 判断是否启动成功
     if [ ! -z "$pid" ]; then
          echo "pid: "$pid
          echo "Kafka has started successfully!"
     else
          echo "Kafka failed to start!"
          exit 0
     fi
fi

echo "##################################################"

停止kafka(stopKafka.sh)

# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Kafka stopping ..."

cd /usr/local/elk/kafka_2.12-2.3.0/bin

./kafka-server-stop.sh

#jps | grep "Kafka" | awk '{print $1}' | xargs kill -9
echo "Kafka has stopped!"

echo "##################################################"

elasticsearch

启动elasticsearch(startElasticSearch.sh)

# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Elasticsearch starting ..."

pid=$(jps | grep "Elasticsearch" | awk '{print $1}')
# 判断是否启动成功
if [ ! -z "$pid" ]; then
     echo "ElasticSearch has already started !"
     echo "pid: "$pid
else
     # 启动
     cd /usr/local/elk/elasticsearch-7.2.0/bin
     nohup ./elasticsearch >/dev/null 2>&1 &

     sleep 3
     pid=$(jps | grep "Elasticsearch" | awk '{print $1}')
     # 判断是否启动成功
     if [ ! -z "$pid" ]; then
          echo "pid: "$pid
          echo "ElasticSearch has started successfully!"
     else
          echo "ElasticSearch failed to start!"
          exit 0
     fi
fi

echo "##################################################"

停止elasticsearch(stopElasticSearch.sh)

# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Elasticsearch stopping ..."



jps | grep "Elasticsearch" | awk '{print $1}' | xargs kill -9
echo "ElasticSearch has stoped!"

echo "##################################################"

kibana

启动kibana(startKibana.sh)

# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Kibana starting ..."


pid=$(lsof -i:5601 | awk 'NR==2 {print $2}')
# 判断是否启动成功
if [ ! -z "$pid" ]; then
     echo "kibana has already started!"
     exit 0
else
     # 启动kibana
     cd /usr/local/elk/kibana-7.2.0-linux-x86_64/bin
     nohup ./kibana  >/dev/null 2>&1 &

     pid=$(lsof -i:5601 | awk 'NR==2 {print $2}')
     # 判断是否启动成功
     if [ ! -z "$pid" ]; then
          echo "kibana has started successfully!"
     else
          echo "Kibana failed to start!"
          exit 0
     fi
fi

echo "##################################################"

停止kibana

# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Kibana stopping ..."

pid=$(lsof -i:5601 | awk 'NR==2 {print $2}')

if [-z "$pid"];then
     echo "kibana has not started!"
     exit 0
fi


lsof -i:5601 | awk 'NR==2 {print $2}' | xargs kill -9
echo "Kibana has stoped!"

echo "##################################################"

filebeat

启动(startFilebeat.sh)

#/bin/sh

echo "######################################################"
echo "######################################################"

echo "filebeat starting ..."

pid=$(ps -ef | grep "filebeat" | grep -v grep | awk '{print $2}')
if [ ! -z "$pid" ]; then
     echo "Filebeat has already started . Please stop it first!"
     echo "pid: "$pid
     exit 0
fi

cd /usr/local/elk/filebeat-7.2.0-linux-x86_64

nohup ./filebeat >/dev/null 2>&1 &

sleep 3

pid=$(ps -ef | grep "filebeat" | grep -v grep | awk '{print $2}')

echo "pid: "$pid

if [ ! -z "$pid" ]; then
     echo "Filebeat has started successfully!"
else
     echo "Filebeat failed to start!"
fi


echo "######################################################"

停止(stopFilebeat.sh)

#/bin/sh

echo "######################################################"
echo "######################################################"

echo "filebeat stopping ..."

pid=$(ps -ef | grep "filebeat" | grep -v grep | awk '{print $2}')
if [ -z "$pid" ]; then
     echo "Filebeat has not started ."
else
     ps -ef | grep "filebeat" | grep -v grep | awk '{print $2}' | xargs kill -9
     echo "Filebeat stopped successfully!"
     
fi

echo "######################################################"

单台服务器上多个服务一键启停

部分服务之间有依赖关系,所以启停设置了时间间隔以保证依赖服务可用,另外有兴趣的同学可以自行编写集群启停脚本,这里不再敖述
启动(ElkServiceStart.sh)

# /bin/bash

echo "--------------------------------------------------"
echo "##################################################"
echo "###################UMP ELK service start##########"

./startZookeeper.sh
sleep 5
./startKafka.sh
./startLogstash.sh
./startElasticSearch.sh
#./startKibana.sh

echo "##################################################"
echo "--------------------------------------------------"

停止(ElkServiceStop.sh)

# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Kibana stopping ..."

pid=$(lsof -i:5601 | awk 'NR==2 {print $2}')

if [ -z "$pid"];then
     echo "kibana has not started!"
fi


lsof -i:5601 | awk 'NR==2 {print $2}' | xargs kill -9
echo "Kibana has stoped!"

echo "##################################################"


# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Elasticsearch stopping ..."



jps | grep "Elasticsearch" | awk '{print $1}' | xargs kill -9
echo "ElasticSearch has stoped!"

echo "##################################################"


# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Logstash stopping ..."



jps | grep "Logstash" | awk '{print $1}' | xargs kill -9
echo "Logstash has stoped!"

echo "##################################################"


# /bin/bash

echo "##################################################"
echo "##################################################"

echo "Kafka stopping ..."

cd /usr/local/elk/kafka_2.12-2.3.0/bin

./kafka-server-stop.sh

jps | grep "Kafka" | awk '{print $1}' | xargs kill -9
echo "Kafka has stopped!"

echo "##################################################"


#!/bin/sh

echo "##################################################"
echo "##################################################"

echo "zookeeper stopping ..."

SIGNAL=${SIGNAL:-TERM}
PIDS=$(ps ax | grep java | grep -i QuorumPeerMain | grep -v grep | awk '{print $1}')

if [ -z "$PIDS" ]; then
  echo "No zookeeper server to stop"
  exit 1
else
  kill -s $SIGNAL $PIDS
  echo "Zookeeper has stopped!"
fi

echo "##################################################"