关于常用聚合函数,ES提供了很多,具体查看官方文档,本文在ES 聚合查询的基础上,相关测试数据也在ES 聚合查询中.
1、range聚合
1.1 统计各个价格范围的食品销售情况,代码如下:
GET food/_search?size=0
{
"aggs": {
"price_range": {
"range": {
"field": "Price",
"ranges": [
{
"from": 0,
"to": 100
},
{
"from": 100,
"to": 200
},
{
"from": 200,
"to": 300
},
{
"from": 300,
"to": 400
}
]
}
}
}
}
1.2 统计每个月的食品销售情况
GET food/_search?size=0
{
"aggs": {
"price_range": {
"range": {
"field": "CreateTime",
"ranges": [
{
"from": "2022-05-01 00:00:00",
"to": "2022-06-01 00:00:00"
},
{
"from": "2022-06-01 00:00:00",
"to": "2022-07-01 00:00:00"
},
{
"from": "2022-07-01 00:00:00",
"to": "2022-08-01 00:00:00"
},
{
"from": "2022-08-01 00:00:00",
"to": "2022-09-01 00:00:00"
}
]
}
}
}
}
2、Histogram 柱状图统计 官方文档
2.1 统计各个价位区间的食品销售数量 间隔是100效果和1.1类似
GET food/_search?size=0
{
"aggs": {
"price_histogram": {
"histogram": {
"field": "Price",
"interval": 100
}
}
}
}
2.2 统计各个价位区间的食品销售数量 间隔是100 要求过滤掉所有区间能销售量为0的桶结果
GET food/_search?size=0
{
"aggs": {
"price_histogram": {
"histogram": {
"field": "Price",
"interval": 100,
"min_doc_count": 1
}
}
}
}
2.3 统计各个价位区间的食品销售数量 间隔是100 如果区间内存在空值,统一用250替代
这里需要新增一条价格为空的数据方便演示,代码如下:
PUT food/_doc/8
{
"CreateTime":"2022-04-10 13:11:11",
"Desc":"猕猴桃 对身体很有好处",
"Level":"高级水果",
"Name":"猕猴桃",
"Price":"",
"Tags":["性价比","水果","保健"],
"Type":"水果"
}
查询代码如下:
GET food/_search?size=0
{
"aggs": {
"price_histogram": {
"histogram": {
"field": "Price",
"interval": 100,
"missing": 250
}
}
}
}
搜索结果如下:
{
"took" : 106,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"price_histogram" : {
"buckets" : [
{
"key" : 0.0,
"doc_count" : 4
},
{
"key" : 100.0,
"doc_count" : 2
},
{
"key" : 200.0,
"doc_count" : 1
},
{
"key" : 300.0,
"doc_count" : 1
}
]
}
}
}
注意原先结果里面200-300区间是没有数据的,这个时候插入了一条价格为空的数据,且制定了miss条件为250,es会将所有的价格为空的值用250替换,所以结果中200-300范围的count为1.
2.4 key关键字
这里key关键字的用法只是改变了桶聚合值得展示形式通过key value形式展示,这里不在赘述.
3、Date-Histogram 官方文档
3.1 按照日期进行聚合,统计每个月所有食品得销量
GET food/_search?size=0
{
"aggs": {
"adate_histogram": {
"date_histogram": {
"field": "CreateTime",
"calendar_interval": "month", //每隔一个月进行统计
"format": "yyyy-MM", //日期展示按照 年月展示,
"min_doc_count": 1 //过滤掉count为0的数据
}
}
}
}
这里用calendar_interval做时间间隔,但是需要注意其支持的单位如下:minute=>
1m,hour=>1h,
day=>
1d,
week=>
1w,
month=>
1M,
quarter=>
1q,
year=>
1y 最小支持到分钟,最大支持到年.
3.2 按照时间进行聚合,统计没毫秒所有食品得销量
GET food/_search?size=0
{
"aggs": {
"adate_histogram": {
"date_histogram": {
"field": "CreateTime",
"fixed_interval": "1ms", //每隔1毫秒进行统计
"min_doc_count": 1 //过滤掉count为0的数据
}
}
}
}
这里用fixed_interval做时间间隔,但是需要注意其支持的单位如下:ms,s,m,h,d 最小支持到毫秒,最大支持到天.
这里有个严重的问题,使用毫秒进行分桶时,会造成es检索出大量数据,造成es卡死,写入收到严重影响,所以要慎用,使用前必须用query或者filter等等进行时间限制
3.3 统计今年一年内的每个月食品的销售情况
这里注意,上面的按照月份统计,如果1月份没有数据,es进行分桶时并不会展示1月份的数据,那么显然不符合需求,所以需要让1月份以0显示出来,代码如下
GET food/_search?size=0
{
"aggs": {
"adate_histogram": {
"date_histogram": {
"field": "CreateTime",
"calendar_interval": "month", //每隔一个月进行统计
"min_doc_count": 0,
"extended_bounds": {
"min": "2022-01-01 00:00:00",
"max": "2022-12-31 00:00:00"
}
}
}
}
}
搜索结果如下:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"adate_histogram" : {
"buckets" : [
{
"key_as_string" : "2022-01-01 00:00:00",
"key" : 1640995200000,
"doc_count" : 0
},
{
"key_as_string" : "2022-02-01 00:00:00",
"key" : 1643673600000,
"doc_count" : 0
},
{
"key_as_string" : "2022-03-01 00:00:00",
"key" : 1646092800000,
"doc_count" : 0
},
{
"key_as_string" : "2022-04-01 00:00:00",
"key" : 1648771200000,
"doc_count" : 1
},
{
"key_as_string" : "2022-05-01 00:00:00",
"key" : 1651363200000,
"doc_count" : 0
},
{
"key_as_string" : "2022-06-01 00:00:00",
"key" : 1654041600000,
"doc_count" : 3
},
{
"key_as_string" : "2022-07-01 00:00:00",
"key" : 1656633600000,
"doc_count" : 4
},
{
"key_as_string" : "2022-08-01 00:00:00",
"key" : 1659312000000,
"doc_count" : 0
},
{
"key_as_string" : "2022-09-01 00:00:00",
"key" : 1661990400000,
"doc_count" : 0
},
{
"key_as_string" : "2022-10-01 00:00:00",
"key" : 1664582400000,
"doc_count" : 0
},
{
"key_as_string" : "2022-11-01 00:00:00",
"key" : 1667260800000,
"doc_count" : 0
},
{
"key_as_string" : "2022-12-01 00:00:00",
"key" : 1669852800000,
"doc_count" : 0
}
]
}
}
}
这里结果就是按照1月份到12月份,按照每个月份进行数量的统计.
注意:这里extended_bounds和min_doc_count的参数的混合使用,当使用extended_bounds进行间隔空白填充时,min_doc_count必须为0,上面说了min_doc_count是为了过滤count为0的风筒,如果min_doc_count为1就会过滤掉extended_bounds产生的空白填充,这就自相矛盾了.
3.4 统计今年一年内的每个月食品的销售情况,并按每个月的销售数量进行排序
GET food/_search?size=0
{
"aggs": {
"adate_histogram": {
"date_histogram": {
"field": "CreateTime",
"calendar_interval": "month", //每隔一个月进行统计
"min_doc_count": 0,
"extended_bounds": {
"min": "2022-01-01 00:00:00",
"max": "2022-12-31 00:00:00"
},
"order": {
"_count": "asc"
}
}
}
}
}
3.5 统计今年每个月的食物的销售数量,并且计算每个月相对于上一个月的累计值(商品价格的叠加)
GET food/_search?size=0
{
"aggs": {
"date_histogram": {
"date_histogram": {
"field": "CreateTime",
"calendar_interval": "month", //每隔一个月进行统计
"min_doc_count": 0,
"extended_bounds": {
"min": "2022-01-01 00:00:00",
"max": "2022-12-31 00:00:00"
}
},
"aggs": {
"sum_agg": {
"sum": {
"field": "Price"
}
},
"cumulative_sum_agg":{
"cumulative_sum": {
"buckets_path": "sum_agg"
}
}
}
}
}
}
搜索结果如下:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"date_histogram" : {
"buckets" : [
{
"key_as_string" : "2022-01-01 00:00:00",
"key" : 1640995200000,
"doc_count" : 0,
"sum_agg" : {
"value" : 0.0
},
"cumulative_sum_agg" : {
"value" : 0.0
}
},
{
"key_as_string" : "2022-02-01 00:00:00",
"key" : 1643673600000,
"doc_count" : 0,
"sum_agg" : {
"value" : 0.0
},
"cumulative_sum_agg" : {
"value" : 0.0
}
},
{
"key_as_string" : "2022-03-01 00:00:00",
"key" : 1646092800000,
"doc_count" : 0,
"sum_agg" : {
"value" : 0.0
},
"cumulative_sum_agg" : {
"value" : 0.0
}
},
{
"key_as_string" : "2022-04-01 00:00:00",
"key" : 1648771200000,
"doc_count" : 1,
"sum_agg" : {
"value" : 0.0
},
"cumulative_sum_agg" : {
"value" : 0.0
}
},
{
"key_as_string" : "2022-05-01 00:00:00",
"key" : 1651363200000,
"doc_count" : 0,
"sum_agg" : {
"value" : 0.0
},
"cumulative_sum_agg" : {
"value" : 0.0
}
},
{
"key_as_string" : "2022-06-01 00:00:00",
"key" : 1654041600000,
"doc_count" : 3,
"sum_agg" : {
"value" : 89.32999992370605
},
"cumulative_sum_agg" : {
"value" : 89.32999992370605
}
},
{
"key_as_string" : "2022-07-01 00:00:00",
"key" : 1656633600000,
"doc_count" : 4,
"sum_agg" : {
"value" : 511.43998622894287
},
"cumulative_sum_agg" : {
"value" : 600.7699861526489
}
},
{
"key_as_string" : "2022-08-01 00:00:00",
"key" : 1659312000000,
"doc_count" : 0,
"sum_agg" : {
"value" : 0.0
},
"cumulative_sum_agg" : {
"value" : 600.7699861526489
}
},
{
"key_as_string" : "2022-09-01 00:00:00",
"key" : 1661990400000,
"doc_count" : 0,
"sum_agg" : {
"value" : 0.0
},
"cumulative_sum_agg" : {
"value" : 600.7699861526489
}
},
{
"key_as_string" : "2022-10-01 00:00:00",
"key" : 1664582400000,
"doc_count" : 0,
"sum_agg" : {
"value" : 0.0
},
"cumulative_sum_agg" : {
"value" : 600.7699861526489
}
},
{
"key_as_string" : "2022-11-01 00:00:00",
"key" : 1667260800000,
"doc_count" : 0,
"sum_agg" : {
"value" : 0.0
},
"cumulative_sum_agg" : {
"value" : 600.7699861526489
}
},
{
"key_as_string" : "2022-12-01 00:00:00",
"key" : 1669852800000,
"doc_count" : 0,
"sum_agg" : {
"value" : 0.0
},
"cumulative_sum_agg" : {
"value" : 600.7699861526489
}
}
]
}
}
}
从结果就可以看出,在计算出每个月销量的同时,计算出了每个月的销售额,并且通过cumulative_sum计算除了当前月份和前面所有月份的累计销售额.
4、Auto-interval date histogram 官方文档
自动直方图,自动直方图会按照指定的桶数量去计算interval,在某些场景下使用还是用方便的,比如统计今年每个月的食物的销售情况,就可以指定桶数量为180,代码如下:
GET food/_search?size=0
{
"aggs": {
"auto_date_histogram_aggs": {
"auto_date_histogram": {
"field": "CreateTime",
"buckets":"12",
"format":"yyyy-MM-dd "
}
}
}
}
结果如下:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"auto_date_histogram_aggs" : {
"buckets" : [
{
"key_as_string" : "2022-04-01",
"key" : 1648771200000,
"doc_count" : 1
},
{
"key_as_string" : "2022-05-01",
"key" : 1651363200000,
"doc_count" : 0
},
{
"key_as_string" : "2022-06-01",
"key" : 1654041600000,
"doc_count" : 3
},
{
"key_as_string" : "2022-07-01",
"key" : 1656633600000,
"doc_count" : 4
}
],
"interval" : "1M"
}
}
}
注意结果中Interval为1M,就是es根据桶数量自动推算出来的.
4、Percentiles 饼图统计
指定百分比计算值的范围,分别统计百分之20、百分之40、百分之60、百分之80、百分之99的商品的价格在什么值
GET food/_search?size=0
{
"aggs": {
"percentiles_agg": {
"percentiles": {
"field": "Price",
"percents": [
20,
40,
60,
80,
99
]
}
}
}
}
结果如下:
{
"took" : 29,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"percentiles_agg" : {
"values" : {
"20.0" : 11.109999656677246,
"40.0" : 28.309999942779555,
"60.0" : 89.91000061035156,
"80.0" : 120.10999908447278,
"99.0" : 300.1099853515625
}
}
}
}
结果显示百分之20的商品价格在11以内,百分之40的价格在28以内,百分之99的价格在300以内.
常用于计算接口的可靠性,假设接口相应在100ms以内,算合格,那么这里的百分之99对应的值,必须在100以内,才算达标,以此类推.
5、Percentile ranks 饼图统计
这个和Percentiles相反,两者都是饼图统计的一种,它可以计算指定范围所占的百分比,而Percentiles指定百分比计算范围
GET food/_search?size=0
{
"aggs": {
"percentile_ranks_agg": {
"percentile_ranks": {
"field": "Price",
"values": [
100,
200,
300,
400
]
}
}
}
}
结果如下:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"percentile_ranks_agg" : {
"values" : {
"100.0" : 71.33613394088104,
"200.0" : 85.69857243009302,
"300.0" : 100.0,
"400.0" : 100.0
}
}
}
}
这里就计算出100以内占百分之71.....