Elasticsearch 使用updateByQuery批量更新数据

对于批量更新数据,通常我们有2种做法

  1. 按照更新的条件,从es查询出所有待更新的记录的id,然后根据id再通过Bulk.Builder接口完成批量更新
  2. 直接使用updateByQuery接口完成批量更新的操作

可以看出第一种更新方法在更新前需要先查询一次es,接下来依次来看每种更新方法的详细使用:
首先创建一个索引put http://localhost:9200/user

{
	"mappings":{
		"_doc":{
			"properties":{
				"uid":{
					"type":"keyword"
				},
				"username":{
					"type":"text"
				},
				"age":{
					"type":"integer"
				},
				"like":{
					"type":"keyword"
				}
			}
		}
	}
}

然后插入几条测试数据,查询得到以下4条数据

http://localhost:9200/user/_search
{
    "hits":{
        "total":4,
        "max_score":1,
        "hits":[
            {
                "_index":"user",
                "_type":"_doc",
                "_id":"MV2uAXgB4vfr7WLWaK3Y",
                "_score":1,
                "_source":{
                    "uid":4,
                    "username":"小红",
                    "age":20,
                    "like":"篮球"
                }
            },
            {
                "_index":"user",
                "_type":"_doc",
                "_id":"MF2uAXgB4vfr7WLWK63s",
                "_score":1,
                "_source":{
                    "uid":3,
                    "username":"小明",
                    "age":20,
                    "like":"篮球"
                }
            },
            {
                "_index":"user",
                "_type":"_doc",
                "_id":"Ll2tAXgB4vfr7WLWcq1P",
                "_score":1,
                "_source":{
                    "uid":1,
                    "username":"张三",
                    "age":18,
                    "like":"乒乓球"
                }
            },
            {
                "_index":"user",
                "_type":"_doc",
                "_id":"L12tAXgB4vfr7WLW162C",
                "_score":1,
                "_source":{
                    "uid":2,
                    "username":"李四",
                    "age":19,
                    "like":"羽毛球"
                }
            }
        ]
    }
}

测试数据准备好后,我们的需求是把"age=20"的记录, “like"修改成"网球”
1,按第一种更新方式,先查出es的id,再使用Bulk.Builder来更新:

@Test
    public void testUpdateByQuery1() throws Exception{
        HttpClientConfig.Builder builder = new HttpClientConfig.Builder("http://localhost:9200/");
        JestClientFactory jestClientFactory = new JestClientFactory();
        jestClientFactory.setHttpClientConfig(builder.build());
        // 获取到jestClient对象
        JestClient jestClient = jestClientFactory.getObject();
        // 构造查询条件
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("age", 20);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder().query(termQueryBuilder).fetchSource(false);
        //fetchSource设置为false表示不查询文档内容,只会返回id

        Search.Builder search = new Search.Builder(searchSourceBuilder.toString()).addIndex("user").addType("_doc");

        SearchResult result = jestClient.execute(search.build());
        System.out.println(result);
    }

打印输出的内容为:

Result: {"took":45,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"user","_type":"_doc","_id":"MV2uAXgB4vfr7WLWaK3Y","_score":1.0},{"_index":"user","_type":"_doc","_id":"MF2uAXgB4vfr7WLWK63s","_score":1.0}]}}, isSucceeded: true, response code: 200, error message: null

可以看到已经查询出条件age=20的两条记录,并且只返回了es的id,有了id我们就能根据Bulk.Builder来进行批量更新:

@Test
    public void testUpdateByQuery1() throws Exception{
        HttpClientConfig.Builder builder = new HttpClientConfig.Builder("http://localhost:9200/");
        JestClientFactory jestClientFactory = new JestClientFactory();
        jestClientFactory.setHttpClientConfig(builder.build());
        // 获取到jestClient对象
        JestClient jestClient = jestClientFactory.getObject();
        // 构造查询条件
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("age", 20);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder().query(termQueryBuilder).fetchSource(false);

        Search.Builder search = new Search.Builder(searchSourceBuilder.toString()).addIndex("user").addType("_doc");

        SearchResult result = jestClient.execute(search.build());
        List<String> esIdList = result.getHits(Map.class).stream().map(hit -> hit.id).collect(Collectors.toList());

        Bulk.Builder bulk = new Bulk.Builder();
        esIdList.forEach(esId -> {
            Map<String,Object> beanMap = new HashMap<>();
            beanMap.put("like", "网球"); // 把like修改为网球

            Map<String,Object> doc = new HashMap<>();
            doc.put("doc", beanMap);
            
            Update.Builder update = new Update.Builder(doc).id(esId); // 指定更新的id
            bulk.addAction(update.build());
        });
        bulk.defaultIndex("user").defaultType("_doc");//指定更新的索引名和type

        BulkResult result1 = jestClient.execute(bulk.build());
        System.out.println(result1);
    }

执行完成后,再次查询es的内容http://localhost:9200/user/_search

{
  "hits": {
    "total": 4,
    "max_score": 1.0,
    "hits": [
      {
        "_index": "user",
        "_type": "_doc",
        "_id": "MV2uAXgB4vfr7WLWaK3Y",
        "_score": 1.0,
        "_source": {
          "uid": 4,
          "username": "小红",
          "age": 20,
          "like": "网球"
        }
      },
      {
        "_index": "user",
        "_type": "_doc",
        "_id": "MF2uAXgB4vfr7WLWK63s",
        "_score": 1.0,
        "_source": {
          "uid": 3,
          "username": "小明",
          "age": 20,
          "like": "网球"
        }
      },
      {
        "_index": "user",
        "_type": "_doc",
        "_id": "Ll2tAXgB4vfr7WLWcq1P",
        "_score": 1.0,
        "_source": {
          "uid": 1,
          "username": "张三",
          "age": 18,
          "like": "乒乓球"
        }
      },
      {
        "_index": "user",
        "_type": "_doc",
        "_id": "L12tAXgB4vfr7WLW162C",
        "_score": 1.0,
        "_source": {
          "uid": 2,
          "username": "李四",
          "age": 19,
          "like": "羽毛球"
        }
      }
    ]
  }
}

发现age=20的两条记录的like字段已经从“篮球”更改为“网球”

2,直接使用updateByQuery接口更新
从官网update_by_query可以看到,需要使用脚本来更新,并且要求的格式为:

{
  "script": {
    "source": "ctx._source.count++",
    "lang": "painless"
  },
  "query": {
    "term": {
      "user.id": "kimchy"
    }
  }
}

所以我们也要按这个格式来构造更新语句,在java代码里,我们可以使用XContentBuilder来构造json格式的对象

@Test
    public void testUpdateByQuery2() throws Exception {
        BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery().filter(QueryBuilders.termQuery("age", 20));
        XContentBuilder xContentBuilder = jsonBuilder()
                .startObject()
                    .field("query", queryBuilder)
                .startObject("script")
                    .field("lang","painless")
                    .field("source","ctx._source.like='网球'")
                .endObject().endObject()
                ;
        xContentBuilder.flush();
        final String payload = ((ByteArrayOutputStream) xContentBuilder.getOutputStream()).toString("UTF-8");
        System.out.println(payload);
    }

输出的内容为:

{
    "query":{
        "bool":{
            "filter":[
                {
                    "term":{
                        "age":{
                            "value":20,
                            "boost":1
                        }
                    }
                }
            ],
            "adjust_pure_negative":true,
            "boost":1
        }
    },
    "script":{
        "lang":"painless",
        "source":"ctx._source.like='网球'"
    }
}

可以看到通过这种方式构造的json数据,符合update_by_query的格式,接下来就使用updateByQuery的api来更新:

@Test
    public void testUpdateByQuery2() throws Exception {
        HttpClientConfig.Builder builder = new HttpClientConfig.Builder("http://localhost:9200/");
        JestClientFactory jestClientFactory = new JestClientFactory();
        jestClientFactory.setHttpClientConfig(builder.build());
        // 获取到jestClient对象
        JestClient jestClient = jestClientFactory.getObject();

        BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery().filter(QueryBuilders.termQuery("age", 20));
        XContentBuilder xContentBuilder = jsonBuilder()
                .startObject()
                    .field("query", queryBuilder)
                .startObject("script")
                    .field("lang","painless")
                    .field("source","ctx._source.like='网球1'")
                .endObject().endObject()
                ;
        xContentBuilder.flush();
        final String payload = ((ByteArrayOutputStream) xContentBuilder.getOutputStream()).toString("UTF-8");
        System.out.println(payload);

        UpdateByQuery.Builder updateByQuery = new UpdateByQuery.Builder(payload)
                .addIndex("user")
                .addType("_doc")
                .setParameter("conflicts","proceed")
                .setParameter("wait_for_completion",false);

        jestClient.execute(updateByQuery.build());
    }

再次查询该索引数据,发现age=20的记录like字段已经成功修改为"网球1"

{
  "hits": {
    "total": 4,
    "max_score": 1.0,
    "hits": [
      {
        "_index": "user",
        "_type": "_doc",
        "_id": "MV2uAXgB4vfr7WLWaK3Y",
        "_score": 1.0,
        "_source": {
          "uid": 4,
          "like": "网球1",
          "age": 20,
          "username": "小红"
        }
      },
      {
        "_index": "user",
        "_type": "_doc",
        "_id": "MF2uAXgB4vfr7WLWK63s",
        "_score": 1.0,
        "_source": {
          "uid": 3,
          "like": "网球1",
          "age": 20,
          "username": "小明"
        }
      },
      {
        "_index": "user",
        "_type": "_doc",
        "_id": "Ll2tAXgB4vfr7WLWcq1P",
        "_score": 1.0,
        "_source": {
          "uid": 1,
          "username": "张三",
          "age": 18,
          "like": "乒乓球"
        }
      },
      {
        "_index": "user",
        "_type": "_doc",
        "_id": "L12tAXgB4vfr7WLW162C",
        "_score": 1.0,
        "_source": {
          "uid": 2,
          "username": "李四",
          "age": 19,
          "like": "羽毛球"
        }
      }
    ]
  }
}

第二种方式中使用到了脚本,能够更新成功,但是有些瑕疵,如更新的字段值是写死在脚本里的,不具备通用性,接下来对这个方式进行改造优化~