目录

目标

ES版本信息

官方文档

实战

新增测试数据

聚合函数

聚合前过滤数据

去重聚合

Percentiles百分比

Percentile ranks计算百分位数等级

top_hits


目标

掌握Metrics aggregations(度量聚合,也作指标聚合 )语法,通过本文列举的各种案例举一反三。具体会涉及以下内容:

  • 常用的聚合函数,如:min()、max()、avg()、sum()等;
  • 聚合前过滤数据;
  • 去重聚合;
  • 详细分析Percentiles(百分比);
  • 详细分析Percentile ranks(计算百分位数等级);
  • 按照范围聚合。
  • top_hits(相当于MySQL8.0版本中的窗口函数。)

ES版本信息

7.17.5


官方文档

Metrics aggregations

es 聚合原理 es聚合语法_es百分比聚合

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-aggregations-metrics.html


实战

新增测试数据

PUT /library_db
{
  "settings": {
    "index": {
      "analysis.analyzer.default.type": "ik_max_word"
    }
  }
}

PUT /library_db/_bulk
{"index":{"_id":"1"}}
{"id":1,"type":"玄幻","name":"诛仙","words_num":120,"chapter_num":600,"completion_time":"2000-09-01","author":"萧鼎","prices":32.12}
{"index":{"_id":"2"}}
{"id":2,"type":"玄幻","name":"诛仙前传:蛮荒行","words_num":30,"chapter_num":67,"completion_time":"2020-09-01","author":"萧鼎","prices":23.12}
{"index":{"_id":"3"}}
{"id":3,"type":"武侠","name":"天龙八部","words_num":80,"chapter_num":120,"completion_time":"1995-09-01","author":"金庸","prices":52.1}
{"index":{"_id":"4"}}
{"id":4,"type":"武侠","name":"射雕英雄传","words_num":67,"chapter_num":95,"completion_time":"1998-01-01","author":"金庸","prices":4.12}
{"index":{"_id":"5"}}
{"id":5,"type":"武侠","name":"神雕侠侣","words_num":75,"chapter_num":76,"completion_time":"2000-01-01","author":"金庸","prices":32.8}
{"index":{"_id":"6"}}
{"id":5,"type":"武侠","name":"倚天屠龙记","words_num":83,"chapter_num":130,"completion_time":"2003-01-01","author":"金庸","prices":100.12}
{"index":{"_id":"7"}}
{"id":7,"type":"玄幻","name":"凡人修仙传","words_num":600,"chapter_num":3000,"completion_time":"2018-01-01","author":"忘语","prices":120.12}
{"index":{"_id":"8"}}
{"id":8,"type":"玄幻","name":"魔天记","words_num":159,"chapter_num":400,"completion_time":"2019-01-01","author":"忘语","prices":11.12}
{"index":{"_id":"9"}}
{"id":9,"type":"都市异能","name":"黄金瞳","words_num":220,"chapter_num":400,"completion_time":"2019-01-01","author":"打眼","prices":74.5}
{"index":{"_id":"10"}}
{"id":10,"type":"玄幻","name":"将夜","words_num":210,"chapter_num":600,"completion_time":"2014-01-01","author":"血红","prices":32.0}
{"index":{"_id":"11"}}
{"id":11,"type":"军事","name":"亮剑","words_num":120,"chapter_num":100,"completion_time":"2012-01-01","author":"都梁","prices":15.0}

聚合函数

需求:在图书馆索引中,求小说单价最大值、单价最小值、单价平均值、单价总和值、小说总数量值。

方法一:使用min()、max()、avg()、sum()等函数进行查询。

#求平均值
#注意:这里的size表示针对该索引展示0条文档。
GET /library_db/_search
{
  "size":0,
  "aggs": {
    "avg_prices": {
      "avg": {
        "field": "prices"
      }
    }
  }
}

#求和
GET /library_db/_search
{
"size":0,
  "aggs": {
    "sum_prices": {
      "sum": {
        "field": "prices"
      }
    }
  }
}

#求最大值
GET /library_db/_search
{
"size":0,
  "aggs": {
    "max_prices": {
      "max": {
        "field": "prices"
      }
    }
  }
}

#求最小值
GET /library_db/_search
{
"size":0,
  "aggs": {
    "min_prices": {
      "min": {
        "field": "prices"
      }
    }
  }
}

#求书本数量
GET /library_db/_search
{
"size":0,
  "aggs": {
    "count_prices": {
      "value_count": {
        "field": "id"
      }
    }
  }
}

方法二:使用stats方法统计所有聚合值。

#求书本数量
GET /library_db/_search
{
  "size":0,
  "aggs": {
    "stats_prices": {
      "stats": {
        "field": "prices"
      }
    }
  }
}

聚合前过滤数据

需求:求价格低于50元的武侠小说数量。

GET /library_db/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "prices": {
              "lt": 50
            }
          }
        },
        {
          "term": {
            "type.keyword": {
              "value": "武侠"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "num": {
      "value_count": {
        "field": "id"
      }
    }
  }
}

去重聚合

需求:求图书馆中的小说类别数量。

GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "num": {
      "cardinality": {
        "field": "type.keyword"
      }
    }
  }
}

Percentiles百分比

Percentiles功能

计算某百分比对应的阈值。如:80表示80%,假如80对应书本价格为45.5,表示图书馆中书本价格低于45.5元的占了80%的比例。

Percentiles注意事项

  • 百分比范围:[100,0];
  • 可以不指定百分比区间,百分比区间默认值为:[ 1, 5, 25, 50, 75, 95, 99 ];
  • 匹配的字段必须是数字类型。

需求:求图书馆中书本价格对应的各个百分比。

GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "prices_percentiles": {
      "percentiles": {
        "field": "prices",
        "percents": [
          10,
          20,
          30,
          40,
          50,
          60,
          70,
          80,
          90,
          100
        ]
      }
    }
  }
}

Percentile ranks计算百分位数等级

Percentile ranks功能

计算指定阈值对应的百分比等级。如:书本价格为45.5对应80.0,表示图书馆中书本价格低于45.5元的占了80%的比例。

Percentile ranks注意事项

匹配的字段必须是数字类型。

需求:求价格大于或等于50、110、130的书本所占的百分比。

GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "prices_ranks": {
      "percentile_ranks": {
        "field": "prices",   
        "values": [ 50,110,130 ]
      }
    }
  }
}

top_hits

需求:求图书馆中每个类型的小说中字数最多的2本小说。

GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "type_count": {
      "terms": {
        "field": "type.keyword"
      },
      "aggs": {
        "type_count": {
          "top_hits": {
            "size": 2,
            "sort": [
              {
                "words_num": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}