Elasticsearch的scroll用法

  • Java代码实现scoll
  • kibana中scoll用法


在某些场景中为了能取得所有的数据,只能使用 scroll 的方式代替

Java代码实现scoll

JAVA scrolls API链接 : Using scrolls in Java.

public List<String> scroll(long lastTime,long nowTime,List<String> list){
      	//设定滚动时间间隔,60秒,不是处理查询结果的所有文档的所需时间
        //游标查询的过期时间会在每次做查询的时候刷新,所以这个时间只需要足够处理当前批的结果就可以了
        final Scroll scroll = new Scroll(TimeValue.timeValueSeconds(60));
        SearchRequest searchRequest = new SearchRequest(INDEX);
        searchRequest.scroll(scroll);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        QueryBuilder queryBuilder = QueryBuilders
                .boolQuery()
                //查询条件在 两个日期范围
                .must(QueryBuilders.rangeQuery("intercept_time").gte(lastTime).lte(nowTime));
        //每个批次实际返回的数量
        searchSourceBuilder..query(queryBuilder );
        searchSourceBuilder.size(10000);
        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse = null;
        try {
            searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            logger.error("获取数据错误1 ->", e);
        }
        assert searchResponse != null;
        String scrollId;
        do {
            for (SearchHit hit : searchResponse.getHits().getHits()) {
                Map<String, Object> sourceAsMap = hit.getSourceAsMap();
                //获取需要数据
                list.add(String.valueOf(sourceAsMap.get("xxx"));
            }
            //每次循环完后取得scrollId,用于记录下次将从这个游标开始取数
            scrollId = searchResponse.getScrollId();
            SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
            scrollRequest.scroll(scroll);
            try {
            	//进行下次查询
                searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
            } catch (IOException e) {
                logger.error("获取数据错误2 ->", e);
            }
        } while (searchResponse.getHits().getHits().length != 0);
        //清除滚屏
        ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
        //也可以选择setScrollIds()将多个scrollId一起使用
        clearScrollRequest.addScrollId(scrollId);
        ClearScrollResponse clearScrollResponse = null;
        try {
            clearScrollResponse = client.clearScroll(clearScrollRequest,RequestOptions.DEFAULT);
        } catch (IOException e) {
            logger.warn("清除滚屏错误 ->", e);
        }
        boolean succeeded = false;
        if (clearScrollResponse!=null){
            succeeded = clearScrollResponse.isSucceeded();
        }

所有数据获取完毕之后,需要手动清理掉 scroll_id 。虽然es 会有自动清理机制,但是 srcoll_id 的存在会耗费大量的资源来保存一份当前查询结果集映像,并且会占用文件描述符。所以用完之后要及时清理。

kibana中scoll用法

Search APIs » Request Body Search »: Scroll .

第一次查询, scroll 该 参数告诉Elasticsearch它应该保持“搜索上下文”存活多长时间

POST /ip_lib/doc/_search?scroll=1m
	{
	  "size": 10000 
	}

返回结果

{
  "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoCgAAAAABUiHVFktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIinRZXck1BZ1B4R1JEQ29UZzQwNG9YbzVnAAAAAAFSIdgWS0c1eUdCQ1JRaVdwMFVnWnd2ZWN0dwAAAAABUiHWFktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIh2RZLRzV5R0JDUlFpV3AwVWdad3ZlY3R3AAAAAAFSIqAWV3JNQWdQeEdSRENvVGc0MDRvWG81ZwAAAAABUiHXFktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIinhZXck1BZ1B4R1JEQ29UZzQwNG9YbzVnAAAAAAFSIp8WV3JNQWdQeEdSRENvVGc0MDRvWG81ZwAAAAABUiKhFldyTUFnUHhHUkRDb1RnNDA0b1hvNWc=",
  "took": 121,
  "timed_out": false,
  "_shards": {
    "total": 10,
    "successful": 10,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 62357,
    "max_score": 1,
    "hits": [
      {
        ...
      },
      ...
      {
        ...
      }
    ]
  }

把上面检索出得scroll_id 传入下面查询条件 执行搜索

POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DnF1ZXJ5VGhlbkZldGNoCgAAAAABUiHVFktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIinRZXck1BZ1B4R1JEQ29UZzQwNG9YbzVnAAAAAAFSIdgWS0c1eUdCQ1JRaVdwMFVnWnd2ZWN0dwAAAAABUiHWFktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIh2RZLRzV5R0JDUlFpV3AwVWdad3ZlY3R3AAAAAAFSIqAWV3JNQWdQeEdSRENvVGc0MDRvWG81ZwAAAAABUiHXFktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIinhZXck1BZ1B4R1JEQ29UZzQwNG9YbzVnAAAAAAFSIp8WV3JNQWdQeEdSRENvVGc0MDRvWG81ZwAAAAABUiKhFldyTUFnUHhHUkRDb1RnNDA0b1hvNWc="
}

这里需要注意:
GET或者POST可以使用,URL不应包含index 或type名称 - 这在原始search请求中指定。现在通过scroll_id不要再传
返回结果还是如上

如果再次检索时间超过上次检索设置的scroll参数的时间则返回为数据,如下

{
  "error": {
    "root_cause": [
      {
        "type": "search_context_missing_exception",
        "reason": "No search context found for id [22159830]"
      },
      ...
      {
        "type": "search_context_missing_exception",
        "reason": "No search context found for id [22160029]"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": -1,
        "index": null,
        "reason": {
          "type": "search_context_missing_exception",
          "reason": "No search context found for id [22159830]"
        }
      },
     ...     
      {
        "shard": -1,
        "index": null,
        "reason": {
          "type": "search_context_missing_exception",
          "reason": "No search context found for id [22160029]"
        }
      }
    ],
    "caused_by": {
      "type": "search_context_missing_exception",
      "reason": "No search context found for id [22160029]"
    }
  },
  "status": 404
}

6万多条数据,当我按正常时间内反复执行完第二段检索语句,到第七次时因为数据已经全部查询完毕返回空列表

{
  "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoCgAAAAABUiNPFldyTUFnUHhHUkRDb1RnNDA0b1hvNWcAAAAAAVIihhZLRzV5R0JDUlFpV3AwVWdad3ZlY3R3AAAAAAFSI04WV3JNQWdQeEdSRENvVGc0MDRvWG81ZwAAAAABUiNNFldyTUFnUHhHUkRDb1RnNDA0b1hvNWcAAAAAAVIjURZXck1BZ1B4R1JEQ29UZzQwNG9YbzVnAAAAAAFSIocWS0c1eUdCQ1JRaVdwMFVnWnd2ZWN0dwAAAAABUiNQFldyTUFnUHhHUkRDb1RnNDA0b1hvNWcAAAAAAVIigxZLRzV5R0JDUlFpV3AwVWdad3ZlY3R3AAAAAAFSIoQWS0c1eUdCQ1JRaVdwMFVnWnd2ZWN0dwAAAAABUiKFFktHNXlHQkNSUWlXcDBVZ1p3dmVjdHc=",
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 20,
    "successful": 20,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 62357,
    "max_score": 1,
    "hits": []
  }
}

scroll超过超时时,将自动删除搜索上下文。但是,保持滚动打开会产生成本
单个删除

DELETE / _search / scroll
{
     "scroll_id ": "DnF1ZXJ5VGhlbkZldGNoCgAAAAABUiP8FktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIkwxZXck1BZ1B4R1JEQ29UZzQwNG9YbzVnAAAAAAFSJAAWS0c1eUdCQ1JRaVdwMFVnWnd2ZWN0dwAAAAABUiP9FktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIj_hZLRzV5R0JDUlFpV3AwVWdad3ZlY3R3AAAAAAFSJMYWV3JNQWdQeEdSRENvVGc0MDRvWG81ZwAAAAABUiP_FktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIkxRZXck1BZ1B4R1JEQ29UZzQwNG9YbzVnAAAAAAFSJMQWV3JNQWdQeEdSRENvVGc0MDRvWG81ZwAAAAABUiTCFldyTUFnUHhHUkRDb1RnNDA0b1hvNWc="
}

多个滚动ID可以作为数组传递:

DELETE / _search / scroll
{
    "scroll_id": ["DnF1ZXJ5VGhlbkZldGNoCgAAAAABUiP8FktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIkwxZXck1BZ1B4R1JEQ29UZzQwNG9YbzVnAAAAAAFSJAAWS0c1eUdCQ1JRaVdwMFVnWnd2ZWN0dwAAAAABUiP9FktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIj_hZLRzV5R0JDUlFpV3AwVWdad3ZlY3R3AAAAAAFSJMYWV3JNQWdQeEdSRENvVGc0MDRvWG81ZwAAAAABUiP_FktHNXlHQkNSUWlXcDBVZ1p3dmVjdHcAAAAAAVIkxRZXck1BZ1B4R1JEQ29UZzQwNG9YbzVnAAAAAAFSJMQWV3JNQWdQeEdSRENvVGc0MDRvWG81ZwAAAAABUiTCFldyTUFnUHhHUkRDb1RnNDA0b1hvNWc=","DnF1ZXJ5VGhlbkZldGNoCgAAAAABUiTHFldyTUFnUHhHUkRDb1RnNDA0b1hvNWcAAAAAAVIkBBZLRzV5R0JDUlFpV3AwVWdad3ZlY3R3AAAAAAFSJMsWV3JNQWdQeEdSRENvVGc0MDRvWG81ZwAAAAABUiTKFldyTUFnUHhHUkRDb1RnNDA0b1hvNWcAAAAAAVIkyBZXck1BZ1B4R1JEQ29UZzQwNG9YbzVnAAAAAAFSJAEWS0c1eUdCQ1JRaVdwMFVnWnd2ZWN0dwAAAAABUiTJFldyTUFnUHhHUkRDb1RnNDA0b1hvNWcAAAAAAVIkBRZLRzV5R0JDUlFpV3AwVWdad3ZlY3R3AAAAAAFSJAIWS0c1eUdCQ1JRaVdwMFVnWnd2ZWN0dwAAAAABUiQDFktHNXlHQkNSUWlXcDBVZ1p3dmVjdHc="]
}

可以使用以下_all参数清除所有搜索上下文:

DELETE / _search / scroll / _all