初探ELK-elasticsearch使用小结
2017/2/13
注:由于在实践过程中,会不断学习到新的东西,文章会不定时保持更新,如果遇到问题,请重新刷新一下文章查看是否有更新。
一、安装 1、jdk 和 环境变量 支持jdk-1.7以上,推荐jdk-1.8 在环境变量配置:JAVA_HOME 2、安装 有2种方式下载,推荐缓存rpm包到本地yum源 1)直接使用rpm wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.4.0/elasticsearch-2.4.0.rpm 2)使用yum源 [root@vm220 ~]# vim /etc/yum.repos.d/elasticsearch.repo [elasticsearch-2.x] name=Elasticsearch repository for 2.x packages baseurl=https://packages.elastic.co/elasticsearch/2.x/centos gpgcheck=1 gpgkey=https://packages.elastic.co/GPG-KEY-elasticsearch enabled=1 [root@vm220 ~]# yum install elasticsearch 3)启动服务 [root@vm220 ~]# chkconfig elasticsearch on [root@vm220 ~]# service elasticsearch start 3、调整配置 [root@vm220 ~]# mkdir -p /data/elasticsearch [root@vm220 ~]# chown elasticsearch:elasticsearch /data/elasticsearch [root@vm220 ~]# grep ^[^#] /etc/elasticsearch/elasticsearch.yml cluster.name: es-test node.name: node-1 path.data: /data/elasticsearch path.logs: /var/log/elasticsearch [root@vm220 ~]# service elasticsearch restart 二、使用 REST API 1、能干啥 Check your cluster, node, and index health, status, and statistics Administer your cluster, node, and index data and metadata Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes Execute advanced search operations such as paging, sorting, filtering, scripting, aggregations, and many others 使用 curl 来操作 API 的方式: curl -X<REST Verb> <Node>:<Port>/<Index>/<Type>/<ID> 2、管理 1)健康状态 [root@vm220 ~]# curl 'localhost:9200/_cat/health?v' epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1473669150 16:32:30 es-test green 1 1 0 0 0 0 0 0 - 100.0% health 的状态包括:green, yellow, red. Green means everything is good (cluster is fully functional), yellow means all data is available but some replicas are not yet allocated (cluster is fully functional) red means some data is not available for whatever reason 2)列出节点 [root@vm220 ~]# curl 'localhost:9200/_cat/nodes?v' host ip heap.percent ram.percent load node.role master name 127.0.0.1 127.0.0.1 6 16 0.00 d * node-1 3)列出索引 [root@vm220 ~]# curl 'localhost:9200/_cat/indices?v' health status index pri rep docs.count docs.deleted store.size pri.store.size 3、CRUD操作 1)创建索引 [root@vm220 ~]# curl -XPUT 'localhost:9200/customer?pretty' { "acknowledged" : true } 创建了一个索引“customer”,且告知返回时使用一个 pretty-print 的方式(json) 再次列出索引 [root@vm220 ~]# curl 'localhost:9200/_cat/indices?v' health status index pri rep docs.count docs.deleted store.size pri.store.size yellow open customer 5 1 0 0 650b 650b 请对比一下之前的执行结果。 请注意,这里的 health 是 yellow,因为我们目前只有一个es节点,没有副本,未做到高可用。 2)给上述索引创建一个 doc 索引 customer 类型为: "external" ,ID为:1 [root@vm220 ~]# curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' { "name": "John Doe" }' 3)获取 doc [root@vm220 ~]# curl -XGET 'localhost:9200/customer/external/1?pretty' 4)重建索引 [root@vm220 ~]# curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' { "name": "Kelly Doe" }' 5)创建索引时,不指定ID,将随机生成 [root@vm220 ~]# curl -XPOST 'localhost:9200/customer/external?pretty' -d ' { "name": "Calvin Kern" }' 6)删除索引 [root@vm220 ~]# curl -XDELETE 'localhost:9200/customer?pretty' 7)更新 doc 中的数据 [root@vm220 ~]# curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d ' { "doc": { "name": "Eric Mood" } }' 更新并增加: [root@vm220 ~]# curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d ' { "doc": { "name": "Eric Mood", "age": 110 } }' 8)删除索引太直接了,如何只删除其中某个doc呢? [root@vm220 ~]# curl -XDELETE 'localhost:9200/customer/external/1?pretty' 4、查看数据 1)导入测试数据: wget https://github.com/bly2k/files/blob/master/accounts.zip?raw=true -O accounts.zip unzip accounts.zip curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json" curl 'localhost:9200/_cat/indices?v' 2)search 查看所有的数据: 方式一: curl 'localhost:9200/bank/_search?q=*&pretty' 方式二: curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "match_all": {} } }' 查看指定的数据: curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "bool": { "must": [ { "match": { "age": "40" } } ], "must_not": [ { "match": { "state": "ID" } } ] } } }' 筛选数据: curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } }' 汇总数据: curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state" } } } }' 备注:具体内容请参考官网doc 三、集群 1、添加2个 es 节点, vm218: 10.50.200.218 vm219: 10.50.200.219 加上现有的: vm220: 10.50.200.220 组成一个3节点的es集群。 [root@vm218 ~]# yum install elasticsearch [root@vm218 ~]# chkconfig elasticsearch on 2、调整集群配置 [root@vm218 ~]# cp /etc/elasticsearch/elasticsearch.yml{,.bak} [root@vm218 ~]# vim /etc/elasticsearch/elasticsearch.yml cluster.name: es-cluster-test node.name: node-vm218 path.data: /data/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 discovery.zen.ping.unicast.hosts: ["10.50.200.218", "10.50.200.219", "10.50.200.220"] discovery.zen.minimum_master_nodes: 3 [root@vm218 ~]# mkdir -p /data/elasticsearch [root@vm218 ~]# chown elasticsearch:elasticsearch /data/elasticsearch 3、启动服务 [root@vm218 ~]# service elasticsearch restart 4、检查一下集群状态 [root@vm218 ~]# curl 'localhost:9200/_cat/health?v' epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1474192163 17:49:23 es-cluster-test green 3 3 12 6 0 0 0 0 - 100.0% [root@vm218 ~]# curl 'localhost:9200/_cat/nodes?v' host ip heap.percent ram.percent load node.role master name 10.50.200.218 10.50.200.218 3 61 0.26 d * node-vm218 10.50.200.220 10.50.200.220 4 85 0.10 d m node-vm220 10.50.200.219 10.50.200.219 3 62 0.06 d m node-vm219 [root@vm218 ~]# curl 'localhost:9200/_cat/indices?v' health status index pri rep docs.count docs.deleted store.size pri.store.size green open .kibana 1 1 2 0 15kb 6.2kb green open filebeat-2016.09.18 5 1 6255 0 4.9mb 1.8mb 5、对应的 logstash 的配置可以调整 output 输出到多个 es 节点 示例如下: [root@vm220 ~]# cat /etc/logstash/conf.d/filebeat.conf input { beats { port => "5044" } } filter { if[type] =~ "NginxAccess" { grok { patterns_dir => ["/etc/logstash/patterns.d"] match => { "message" => "%{NGINXACCESS}" } } date { match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ] remove_field => [ "timestamp" ] } } } output { if[type] =~ "NginxAccess" { elasticsearch { hosts => ["10.50.200.218:9200", "10.50.200.219:9200", "10.50.200.220:9200"] manage_template => false index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}" document_type => "%{[@metadata][type]}" } } } [root@vm220 ~]# service logstash restart 四、小结FAQ 1、数据流向 ------------------------------------------------------------------------- |---------client--------|----------server-------------------------------| / elasticsearch(vm218) log_files -> filebeat --> logstash -> - elasticsearch(vm219) -> kibana \ elasticsearch(vm220) ------------------------------------------------------------------------- 2、关于集群名称 如果更改了集群名,则将根据集群名重新建立数据目录,这将导致数据丢失。 例如: [root@vm220 ~]# ls /data/elasticsearch/ es-cluster-test es-test 上述示例中,es-test 这个目录是旧的,新的集群名称是 es-cluster-test,进入目录将发现,之前建立的索引都在 es-test 这个目录中。 3、自定义模版中的几个细节 1)为何会分词(请对照观察在 kibana 上查看 index pattern 的 analyzed 字段是否勾选) 针对 string 类型的 field,存储时默认会被标准分析器分析并进行分词。 在 properties 中,默认可能是这样定义一个 field 的: "properties": { "host": { "type": "string" }, } 如果 host="test-me-vm49",则将自动分词为"test", "me", "vm49",但是很多时候,从日志获取的内容已经不需要这样分词,怎么解决呢? 不妨调整为: "properties": { "host": { "type": "string", "index": "not_analyzed" }, } 此时,针对这个 field 将不会分词,直接使用原始的字符串即可。 实际上,在日志分析的场景,很多时候并不需要分词,此时可以使用 dynamic_templates 来匹配处理: "dynamic_templates": [ { "template1": { "mapping": { "index": "not_analyzed", "type": "string" }, "match_mapping_type": "string", "match": "*" } } ], 2)针对指定 field 不做 index(请对照观察在 kibana 上查看 index pattern 的 indexed 字段是否勾选) 例如: "properties": { "host": { "type": "string", "index": "no" }, } 4、关于 ES 如何使用自定义的模版 1)自定义一个模版文件 本例为:/tmp/elasticsearch.template.filebeat.json 结合 /etc/filebeat/filebeat.template.json 提供的默认模版以及 kibana 页面获取的日志内容的 json 格式输出,自定义了如下模版。 目的:演示如何显式的指定个别 field 的配置中 index 为 "no" 或 "not_analyzed",其他 "{dynamic_type}" 通过 dynamic_templates 来定义。 [root@vm220 ~]# cat /tmp/elasticsearch.template.filebeat.json { "template": "filebeat-*", "settings": { "index.refresh_interval": "5s" }, "mappings": { "_default_": { "_all": { "enabled": true, "norms": { "enabled": false } }, "dynamic_templates": [ { "template1": { "mapping": { "doc_values": true, "ignore_above": 1024, "index": "not_analyzed", "type": "{dynamic_type}" }, "match": "*" } } ], "properties": { "message": { "type": "string", "index": "no" }, "@version": { "type": "string", "index": "no" }, "@timestamp": { "type": "date" }, "type" : { "type" : "string", "index": "not_analyzed" }, "input_type" : { "type" : "string", "index": "no" }, "beat" : { "properties" : { "hostname" : { "type" : "string", "index": "not_analyzed", "doc_values": "true" }, "name" : { "type" : "string", "index": "not_analyzed", "doc_values": "true" } } }, "source" : { "type" : "string", "index": "no" }, "offset": { "type": "long", "index": "no" }, "count" : { "type" : "long", "index": "no" }, "host" : { "type" : "string", "index": "not_analyzed" }, "tags" : { "type" : "string", "index": "not_analyzed" }, "bytes" : { "type" : "long", "index": "not_analyzed" }, "geoip" : { "properties" : { "location" : { "type" : "geo_point", "index": "not_analyzed" } } } } } } } 2)导入模版到 ES 中 [root@vm220 ~]# curl -XPUT 'http://localhost:9200/_template/filebeat?pretty' -d@/tmp/elasticsearch.template.filebeat.json { "acknowledged" : true } 导入完成后可以验证一下: [root@vm220 ~]# curl 'http://localhost:9200/_template/filebeat?pretty' 3)查询并清理旧索引 查询: [root@vm220 ~]# curl 'localhost:9200/_cat/indices?v' health status index pri rep docs.count docs.deleted store.size pri.store.size green open .kibana 1 1 6 0 86.6kb 44.6kb green open filebeat-nginxaccess-www.work.com-2016.09.19 5 1 838 0 1mb 531.2kb green open filebeat-nginxaccess-www.test.com-2016.09.19 5 1 838 0 1019.4kb 504.3kb 删除: [root@vm220 ~]# curl -XDELETE 'http://localhost:9200/filebeat-*?pretty' { "acknowledged" : true } 再次验证: [root@vm220 ~]# curl 'localhost:9200/_cat/indices?v' health status index pri rep docs.count docs.deleted store.size pri.store.size green open .kibana 1 1 6 0 86.6kb 44.6kb green open filebeat-nginxaccess-www.work.com-2016.09.19 5 1 36 0 188.4kb 94.2kb green open filebeat-nginxaccess-www.test.com-2016.09.19 5 1 36 0 204.7kb 102.3kb 4)进入kibana配置 删除 index patterns 然后选择"refresh field list"刷新一下。 ZYXW、参考 1、官网 https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-repositories.html https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-dir-layout.html https://www.elastic.co/guide/en/elasticsearch/reference/current/_exploring_your_cluster.html https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-template.html#load-template-shell https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html https://www.elastic.co/guide/en/elasticsearch/reference/current/default-mapping.html https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html 2、elasticsearch 映射规则 http://www.cnblogs.com/bmaker/p/5463888.html 3、ELK中文 http://kibana.logstash.es/content/logstash/plugins/output/elasticsearch.html 4、一种Elasticsearch数据类型冗余方案 http://blog.csdn.net/cnweike/article/details/38397707 5、Logstash中配置默认索引映射(_default_属性) http://blog.csdn.net/xifeijian/article/details/50823494