文章目录
- 微服务框架
- SpringCloud微服务架构
- 23 搜索结果处理
- 23.2 分页
- 23.2.1 分页
- 23.2.2 深度分页问题
- 23.2.3 深度分页解决方案
- 23.2.4 总结
23 搜索结果处理
23.2 分页
23.2.1 分页
elasticsearch 默认情况下只返回top10的数据。而如果要查询更多数据就需要修改分页参数了。
elasticsearch中通过修改from、size参数来控制要返回的分页结果:
之前默认查询所有文档,默认显示10条,就是因为ES 底层有一个默认的分页参数
【语法】
GET /hotel/_search
{
"query": {
"match_all": {}
},
"from": 990, // 分页开始的位置,默认为0
"size": 10, // 期望获取的文档总数
"sort": [
{"price": "asc"}
]
}
试试
# 分页查询
GET /hotel/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"price": "asc"
}
],
"from": 0,
"size": 3
}
查询第一页,每页三条
运行结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 201,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "hotel",
"_type" : "_doc",
"_id" : "197837109",
"_score" : null,
"_source" : {
"address" : "布吉镇深惠路龙珠商城",
"brand" : "如家",
"business" : "布吉/深圳东站",
"city" : "深圳",
"id" : 197837109,
"location" : "22.602482, 114.123284",
"name" : "如家酒店·neo(深圳龙岗大道布吉地铁站店)",
"pic" : "https://m.tuniucdn.com/fb2/t1/G6/M00/25/58/Cii-TF3PFZOIA7jwAAKInGFN4xgAAEVbAGeP4AAAoi0485_w200_h200_c1_t0.jpg",
"price" : 127,
"score" : 43,
"starName" : "二钻"
},
"sort" : [
127
]
},
{
"_index" : "hotel",
"_type" : "_doc",
"_id" : "2316304",
"_score" : null,
"_source" : {
"address" : "龙岗街道龙岗墟社区龙平东路62号",
"brand" : "如家",
"business" : "龙岗中心区/大运新城",
"city" : "深圳",
"id" : 2316304,
"location" : "22.730828, 114.278337",
"name" : "如家酒店(深圳双龙地铁站店)",
"pic" : "https://m.tuniucdn.com/fb3/s1/2n9c/4AzEoQ44awd1D2g95a6XDtJf3dkw_w200_h200_c1_t0.jpg",
"price" : 135,
"score" : 45,
"starName" : "二钻"
},
"sort" : [
135
]
},
{
"_index" : "hotel",
"_type" : "_doc",
"_id" : "1630005459",
"_score" : null,
"_source" : {
"address" : "罗湖区宝安南路2078号深港豪苑(与红桂路交汇处)",
"brand" : "7天酒店",
"business" : "",
"city" : "深圳",
"id" : 1630005459,
"location" : "22.550341, 114.10965",
"name" : "7天连锁酒店(深圳地王大厦红桂路店)(原红桂路店)",
"pic" : "https://m.tuniucdn.com/fb2/t1/G2/M00/EA/18/Cii-T1k1KaGIIkQVAAD4fD_T3FcAALTtABiCJ8AAPiU164_w200_h200_c1_t0.jpg",
"price" : 143,
"score" : 39,
"starName" : "二钻"
},
"sort" : [
143
]
}
]
}
}
看第二页
# 分页查询
GET /hotel/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"price": "asc"
}
],
"from": 3,
"size": 3
}
运行结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 201,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "hotel",
"_type" : "_doc",
"_id" : "541619",
"_score" : null,
"_source" : {
"address" : "莘庄镇莘浜路172号",
"brand" : "如家",
"business" : "莘庄工业区",
"city" : "上海",
"id" : 541619,
"location" : "31.105797, 121.37755",
"name" : "如家酒店(上海莘庄地铁站龙之梦商业广场店)",
"pic" : "https://m.tuniucdn.com/fb3/s1/2n9c/3mKs3jETvJDj3dDdkRB9UyLLvPna_w200_h200_c1_t0.jpg",
"price" : 149,
"score" : 44,
"starName" : "二钻"
},
"sort" : [
149
]
},
{
"_index" : "hotel",
"_type" : "_doc",
"_id" : "1400304687",
"_score" : null,
"_source" : {
"address" : "龙岗大道横岗段4004号",
"brand" : "如家",
"business" : "龙岗中心区/大运新城",
"city" : "深圳",
"id" : 1400304687,
"location" : "22.642629, 114.202799",
"name" : "如家酒店(深圳横岗地铁站新马商贸城店)",
"pic" : "https://m.tuniucdn.com/fb2/t1/G6/M00/25/5A/Cii-TF3PFkiIb27dAAEqdDcKl3YAAEViQGVWY0AASqM960_w200_h200_c1_t0.jpg",
"price" : 149,
"score" : 43,
"starName" : "二钻"
},
"sort" : [
149
]
},
{
"_index" : "hotel",
"_type" : "_doc",
"_id" : "728415",
"_score" : null,
"_source" : {
"address" : "晒布路67号",
"brand" : "如家",
"business" : "东门商业区",
"city" : "深圳",
"id" : 728415,
"location" : "22.550183, 114.120771",
"name" : "如家酒店·neo(深圳东门步行街晒布地铁站店)",
"pic" : "https://m.tuniucdn.com/fb2/t1/G6/M00/25/57/Cii-U13PFNWISSnQAAEpTtoilsQAAEVWgEvur8AASlm647_w200_h200_c1_t0.jpg",
"price" : 152,
"score" : 46,
"starName" : "二钻"
},
"sort" : [
152
]
}
]
}
}
没毛病
23.2.2 深度分页问题
ES是分布式的【集群】,所以会面临深度分页问题。
例如按price排序后,获取from = 990,size =10的数据:
- 首先在每个数据分片上都排序并查询前1000条文档。
- 然后将所有节点的结果聚合,在内存中重新排序选出前1000条文档
- 最后从这1000条中,选取从990开始的10条文档
如果搜索页数过深,或者结果集(from + size)越大,对内存和CPU的消耗也越高。因此ES设定结果集查询的上限是10000
试试阈值
这样还没问题,+ 1
直接报错了
就是这样
【那万一真有这样的需求咋办?就没办法解决了吗?】【当然是有的】
23.2.3 深度分页解决方案
针对深度分页,ES提供了两种解决方案,官方文档:
- search after:分页时需要排序,原理是从上一次的排序值开始,查询下一页数据。官方推荐使用的方式。
- scroll:原理将排序数据形成快照,保存在内存。【内存消耗太大了】官方已经不推荐使用。
23.2.4 总结
from + size:
- 优点:支持随机翻页
- 缺点:深度分页问题,默认查询上限(from + size)是10000
- 场景:百度、京东、谷歌、淘宝这样的随机翻页搜索
after search:
- 优点:没有查询上限(单次查询的size不超过10000)
- 缺点:只能向后逐页查询,不支持随机翻页
- 场景:没有随机翻页需求的搜索,例如手机向下滚动翻页
scroll:
- 优点:没有查询上限(单次查询的size不超过10000)
- 缺点:会有额外内存消耗,并且搜索结果是非实时的
- 场景:海量数据的获取和迁移。从ES7.1开始不推荐,建议用 after search方案。