1.简介
聚合分析是用来对ElasticSearch中存储的数据进行统计分析的,主要包括三种类型。
(1).Metric
指标分析类型主要分为两类,一类是单值分析,只输出一个分析结果,主要包括min、max、avg、sum和cardinality(没有count)。另一类是多值分析,可以输出多个分析结果,如stats、extended_stats、percentiles、percentile_ranks和top hits。

(2).Bucket
分桶分析类型是按照一定的规则将文档分配到不同的桶中,达到分类分析的目的,类似于sql中的group by。主要包括terms、range、date range、histogram、date histogram。

(3).Pipline
管道分析类型,基于上一级的聚合分析结果进行再分析,而且支持链式调用。pipeline的分析结果会输出到原结果中,根据输出位置的不同,主要分为两类,一类是结果与现有聚合分析同级的sibling,主要包括max、min、avg、sum bucket以及stats、extended stats、percentiles bucket。另一类是结果内嵌到现有聚合分析结果中的parent,主要包括derivative、moving average和cumulative。

(4).文档准备
打开kibana Dev Tools,分别添加索引和文档记录。

PUT /employee
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
}
}
PUT /employee/_mapping
{
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
"birthday": {
"type": "date",
"format": "yyyy-MM-dd"
},
"job": {
"type": "keyword"
},
"salary": {
"type": "float"
}
}
}
POST /employee/_doc
{
"name": "James Harden",
"job": "Java engineer",
"age": 31,
"salary": 30000.00,
"birthday": "1991-01-01"
}
{
"name": "Stephen Curry",
"job": "Java engineer",
"age": 27,
"salary": 20000.00,
"birthday": "1995-08-06"
}
{
"name": "LeBron James",
"job": "Technical director",
"age": 35,
"salary": 50000.00,
"birthday": "1987-12-25"
}
{
"name": "Damian Lillard",
"job": "Vue engineer",
"age": 25,
"salary": 18000.00,
"birthday": "1996-10-01"
}
{
"name": "Kevin Durant",
"job": "Vue engineer",
"age": 30,
"salary": 28000.00,
"birthday": "1992-05-01"
}
{
"name": "Chirs Paul",
"job": "Java engineer",
"age": 33,
"salary": 29000.00,
"birthday": "1988-12-02"
}
{
"name": "Jason Tatum",
"job": "Java engineer",
"age": 24,
"salary": 15000.00,
"birthday": "1997-08-02"
}

2.单值查询
(1).min、max、avg、sum
min、max、avg、sum分别类似于sql中的min、max、avg、sum功能,如对salary字段进行avg查询。

POST /employee/_search
{
"size": 0,
"aggs": {
"salary_avg": {
"avg": {
"field": "salary"
}
}
}
}
{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"salary_avg" : {
"value" : 26250.0
}
}
}

(2).cardinality
统计不同数值的个数,类似sql中的distinct功能,如对job字段进行cardinality查询。

POST /employee/_search
{
"size": 0,
"aggs": {
"job_distinct": {
"cardinality": {
"field": "job"
}
}
}
}
{
"took" : 28,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"job_distinct" : {
"value" : 3
}
}
}

3.多值查询
(1).stats
返回一系列数值类型的统计值,包含min、max、avg、sum和count,如对salary字段进行stats查询。

POST /employee/_search
{
"size": 0,
"aggs": {
"salary_stats": {
"stats": {
"field": "salary"
}
}
}
}
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"salary_stats" : {
"count" : 8,
"min" : 15000.0,
"max" : 50000.0,
"avg" : 26250.0,
"sum" : 210000.0
}
}
}

(2).extended_stats
对stats的扩展,包含了更多的统计数据,如平方和(sum_of_squares)、方差(variance)、标准差(std_deviation)和标准差范围(std_deviation_bounds),如对salary字段进行extended_stats查询。

POST /employee/_search
{
"size": 0,
"aggs": {
"salary_extended_stats": {
"extended_stats": {
"field": "salary"
}
}
}
}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"salary_extended_stats" : {
"count" : 8,
"min" : 15000.0,
"max" : 50000.0,
"avg" : 26250.0,
"sum" : 210000.0,
"sum_of_squares" : 6.374E9,
"variance" : 1.076875E8,
"std_deviation" : 10377.25879025863,
"std_deviation_bounds" : {
"upper" : 47004.51758051726,
"lower" : 5495.4824194827415
}
}
}
}

(3).percentiles
百分位数统计,通常用于统计数据分布情况,如对salary字段进行percentiles查询。

POST /employee/_search
{
"size": 0,
"aggs": {
"salary_percentiles": {
"percentiles": {
"field": "salary"
}
}
}
}
{
"took" : 18,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"salary_percentiles" : {
"values" : {
"1.0" : 15000.0,
"5.0" : 15000.0,
"25.0" : 19000.0,
"50.0" : 24000.0,
"75.0" : 29500.0,
"95.0" : 50000.0,
"99.0" : 50000.0
}
}
}
}

百分位数统计名称解释:高等院校的入学考试成绩经常以百分位数的形式统计。假设某个考生在入学考试中的语文原始分数为54分。相对于参加同一考试的其他学生来说,并不容易知道他的成绩如何。但是如果原始分数54分恰好对应的是第70百分位数,我们就能知道大约70%的学生的考分比他低,而约30%的学生考分比他高。

(4).percentile_ranks
百分位数统计的逆操作,返回值所在的百分位,如对salary字段,查询其值为28000和30000的百分位。

POST /employee/_search
{
"size": 0,
"aggs": {
"salary_percentile_ranks": {
"percentile_ranks": {
"field": "salary",
"values": [28000,30000]
}
}
}
}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"salary_percentile_ranks" : {
"values" : {
"28000.0" : 61.111111111111114,
"30000.0" : 75.59523809523809
}
}
}
}

(5).top_hits
一般用于分桶后获取该桶内最匹配文档的顶部文档列表,即详情数据,如获取salary字段前3个详情。

POST /employee/_search
{
"size": 0,
"aggs": {
"salary_top_hits": {
"top_hits": {
"size": 3,
"sort": [{
"salary": {
"order": "desc"
}
}]
}
}
}
}
{
"took" : 363,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"salary_top_hits" : {
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "w4l1hnsBEsHOdz1YM8pq",
"_score" : null,
"_source" : {
"name" : "LeBron James",
"job" : "Technical director",
"age" : 35,
"salary" : 50000.0,
"birthday" : "1987-12-25"
},
"sort" : [
50000.0
]
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "wYl0hnsBEsHOdz1Y4cqT",
"_score" : null,
"_source" : {
"name" : "James Harden",
"job" : "Java engineer",
"age" : 31,
"salary" : 30000.0,
"birthday" : "1991-01-01"
},
"sort" : [
30000.0
]
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "xol1hnsBEsHOdz1YXcqt",
"_score" : null,
"_source" : {
"name" : "Chirs Paul",
"job" : "Java engineer",
"age" : 33,
"salary" : 29000.0,
"birthday" : "1988-12-02"
},
"sort" : [
29000.0
]
}
]
}
}
}
}