一 序

本文属于极客时间Elasticsearch核心技术与实战学习笔记系列。

2 Bucket & Metric Aggregation

  • Metric 一些系列的统计方法
  • Bucket 一组满足条件的文档

es6 list 叠加_聚合

2.1 Aggregation 的语法

Aggregation 属于 Search 的一部分。一般情况下,建议将其 Size 指定为 0  

es6 list 叠加_es6 list 叠加_02

2.2 一个例子:工资统计

左侧查询:分别是查询最大值、最小值、平均值。指定了函数与field.

右侧返回的结果:hits是20条结果,因为size=0所以文档不会再搜索结果展示出来。下面aggregations是返回的3个聚合的结果。

2.3 Mertric Aggregation

  单值分析:只输出一个分析结果

  • min,max,avg,sum
  • Cardinality(类似sql: distinct Count)

多值分析:输出多个分析结果

  • stats ,extended stats
  • percentile, percentile rank
  • top hits (排在前面的示例)

2.4 Metric 聚合的具体 Demo

定义索引

es6 list 叠加_Metric_03

数据准备:

PUT /employees/_bulk
{ "index" : {  "_id" : "1" } }
{ "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
{ "index" : {  "_id" : "2" } }
{ "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
{ "index" : {  "_id" : "3" } }
{ "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
{ "index" : {  "_id" : "4" } }
{ "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
{ "index" : {  "_id" : "5" } }
{ "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
{ "index" : {  "_id" : "6" } }
{ "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
{ "index" : {  "_id" : "7" } }
{ "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
{ "index" : {  "_id" : "8" } }
{ "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
{ "index" : {  "_id" : "9" } }
{ "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
{ "index" : {  "_id" : "10" } }
{ "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
{ "index" : {  "_id" : "11" } }
{ "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
{ "index" : {  "_id" : "12" } }
{ "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
{ "index" : {  "_id" : "13" } }
{ "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
{ "index" : {  "_id" : "14" } }
{ "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
{ "index" : {  "_id" : "15" } }
{ "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
{ "index" : {  "_id" : "16" } }
{ "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "17" } }
{ "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "18" } }
{ "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
{ "index" : {  "_id" : "19" } }
{ "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
{ "index" : {  "_id" : "20" } }
{ "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}

查询最低工资:

es6 list 叠加_聚合_04

hits里面total是总的数据量,aggregations返回的是最低工资。

同样,查找最高的工资:

es6 list 叠加_elasticsearch_05

上面是查询单个值,如果要查询多个值,

es6 list 叠加_elasticsearch_06

也可以使用一个聚合查询,输出多个值

es6 list 叠加_es6 list 叠加_07

3 Bucket

按照一定的规则,将文档分配到不同的桶中,从而达到分类的目的。ES 提供的一些常见的 Bucket Aggregation

  1. Term
  2. 数字类型
  • Range 、Date Range
  • Histogram / Data Histogram

支持嵌套:也就在桶里在做分桶

es6 list 叠加_es6 list 叠加_08

3.1Terms Aggregation

 字段需要打开 fielddata,才能进行 Terms Aggregation

  • Keyword 默认支持 doc_values
  • Text 需要在 Mapping 中 enable ,会按照分词后的结果进行分

Demo

  • 对 job 和 job.keyword 进行聚合
  • 对性别进行 Terms 聚合
  • 指定 bucket size

3.2 demo

es6 list 叠加_聚合_09

返回的buckets里面有对应的key及数量。

Text 字段进行 terms 聚合查询,失败

es6 list 叠加_es6 list 叠加_10

对 Text 字段打开 fielddata,支持terms aggregation,

es6 list 叠加_Metric_11

你会发现查询结果跟之前不一样,因为Text 字段进行分词后执行 terms 聚合查询,而keyword是不会进行分词的。

es6 list 叠加_Metric_12

es6 list 叠加_Bucket_13

#指定 bucket 的 size

es6 list 叠加_Bucket_14

指定size,不同工种中,年纪最大的3个员工的具体信息

es6 list 叠加_elasticsearch_15

先用:job.keyword做分桶。再定义子查询:tophits方式,指定size=3,结果排序方式: age降序

3.3 优化 Terms 聚合的性能

适应条件:在聚合经常发生,性能高的,索引不断写入。

预加载cache被打开后,一旦有文档写入,term Aggregation 会被提前算好。

es6 list 叠加_es6 list 叠加_16

4 Range & Histogram聚合

  • 按照数字的范围,进行分桶
  • 在 Range Aggregation 中,可以自定义 Key
  • Demo:
  • 按照工资的 Range 分桶
  • 按照工资的间隔(Histogram)分桶

4.1 demo

针对salary进行分桶

es6 list 叠加_Bucket_17

上面可以看到,你可以指定key,不指定es也会自动生成默认key.

demo2:Salary Histogram,工资0到10万,以 5000一个区间进行分桶

es6 list 叠加_elasticsearch_18

min,max指定区间,interval指定间隔。

5 Bucket + Metric Aggregation

 Bucket 聚合分析允许通过添加子聚合分析进一步分析,子聚合分析可以是

  • Bucket
  • Metric

Demo

  • 按照工作类型进行分桶,并统计工资信息
  • 先按照工作类型分桶,然后按性别分桶,并统计工资信息

5.1 demo

嵌套聚合1,按照工作类型分桶,并统计工资信息

es6 list 叠加_Metric_19

多次嵌套。根据工作类型分桶,然后按照性别分桶,计算工资的统计信息

es6 list 叠加_Bucket_20