Elasticsearch 使用updateByQuery批量更新数据
对于批量更新数据,通常我们有2种做法
- 按照更新的条件,从es查询出所有待更新的记录的id,然后根据id再通过Bulk.Builder接口完成批量更新
- 直接使用updateByQuery接口完成批量更新的操作
可以看出第一种更新方法在更新前需要先查询一次es,接下来依次来看每种更新方法的详细使用:
首先创建一个索引put http://localhost:9200/user
{
"mappings":{
"_doc":{
"properties":{
"uid":{
"type":"keyword"
},
"username":{
"type":"text"
},
"age":{
"type":"integer"
},
"like":{
"type":"keyword"
}
}
}
}
}
然后插入几条测试数据,查询得到以下4条数据
http://localhost:9200/user/_search
{
"hits":{
"total":4,
"max_score":1,
"hits":[
{
"_index":"user",
"_type":"_doc",
"_id":"MV2uAXgB4vfr7WLWaK3Y",
"_score":1,
"_source":{
"uid":4,
"username":"小红",
"age":20,
"like":"篮球"
}
},
{
"_index":"user",
"_type":"_doc",
"_id":"MF2uAXgB4vfr7WLWK63s",
"_score":1,
"_source":{
"uid":3,
"username":"小明",
"age":20,
"like":"篮球"
}
},
{
"_index":"user",
"_type":"_doc",
"_id":"Ll2tAXgB4vfr7WLWcq1P",
"_score":1,
"_source":{
"uid":1,
"username":"张三",
"age":18,
"like":"乒乓球"
}
},
{
"_index":"user",
"_type":"_doc",
"_id":"L12tAXgB4vfr7WLW162C",
"_score":1,
"_source":{
"uid":2,
"username":"李四",
"age":19,
"like":"羽毛球"
}
}
]
}
}
测试数据准备好后,我们的需求是把"age=20"的记录, “like"修改成"网球”
1,按第一种更新方式,先查出es的id,再使用Bulk.Builder来更新:
@Test
public void testUpdateByQuery1() throws Exception{
HttpClientConfig.Builder builder = new HttpClientConfig.Builder("http://localhost:9200/");
JestClientFactory jestClientFactory = new JestClientFactory();
jestClientFactory.setHttpClientConfig(builder.build());
// 获取到jestClient对象
JestClient jestClient = jestClientFactory.getObject();
// 构造查询条件
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("age", 20);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder().query(termQueryBuilder).fetchSource(false);
//fetchSource设置为false表示不查询文档内容,只会返回id
Search.Builder search = new Search.Builder(searchSourceBuilder.toString()).addIndex("user").addType("_doc");
SearchResult result = jestClient.execute(search.build());
System.out.println(result);
}
打印输出的内容为:
Result: {"took":45,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"user","_type":"_doc","_id":"MV2uAXgB4vfr7WLWaK3Y","_score":1.0},{"_index":"user","_type":"_doc","_id":"MF2uAXgB4vfr7WLWK63s","_score":1.0}]}}, isSucceeded: true, response code: 200, error message: null
可以看到已经查询出条件age=20的两条记录,并且只返回了es的id,有了id我们就能根据Bulk.Builder来进行批量更新:
@Test
public void testUpdateByQuery1() throws Exception{
HttpClientConfig.Builder builder = new HttpClientConfig.Builder("http://localhost:9200/");
JestClientFactory jestClientFactory = new JestClientFactory();
jestClientFactory.setHttpClientConfig(builder.build());
// 获取到jestClient对象
JestClient jestClient = jestClientFactory.getObject();
// 构造查询条件
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("age", 20);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder().query(termQueryBuilder).fetchSource(false);
Search.Builder search = new Search.Builder(searchSourceBuilder.toString()).addIndex("user").addType("_doc");
SearchResult result = jestClient.execute(search.build());
List<String> esIdList = result.getHits(Map.class).stream().map(hit -> hit.id).collect(Collectors.toList());
Bulk.Builder bulk = new Bulk.Builder();
esIdList.forEach(esId -> {
Map<String,Object> beanMap = new HashMap<>();
beanMap.put("like", "网球"); // 把like修改为网球
Map<String,Object> doc = new HashMap<>();
doc.put("doc", beanMap);
Update.Builder update = new Update.Builder(doc).id(esId); // 指定更新的id
bulk.addAction(update.build());
});
bulk.defaultIndex("user").defaultType("_doc");//指定更新的索引名和type
BulkResult result1 = jestClient.execute(bulk.build());
System.out.println(result1);
}
执行完成后,再次查询es的内容http://localhost:9200/user/_search
{
"hits": {
"total": 4,
"max_score": 1.0,
"hits": [
{
"_index": "user",
"_type": "_doc",
"_id": "MV2uAXgB4vfr7WLWaK3Y",
"_score": 1.0,
"_source": {
"uid": 4,
"username": "小红",
"age": 20,
"like": "网球"
}
},
{
"_index": "user",
"_type": "_doc",
"_id": "MF2uAXgB4vfr7WLWK63s",
"_score": 1.0,
"_source": {
"uid": 3,
"username": "小明",
"age": 20,
"like": "网球"
}
},
{
"_index": "user",
"_type": "_doc",
"_id": "Ll2tAXgB4vfr7WLWcq1P",
"_score": 1.0,
"_source": {
"uid": 1,
"username": "张三",
"age": 18,
"like": "乒乓球"
}
},
{
"_index": "user",
"_type": "_doc",
"_id": "L12tAXgB4vfr7WLW162C",
"_score": 1.0,
"_source": {
"uid": 2,
"username": "李四",
"age": 19,
"like": "羽毛球"
}
}
]
}
}
发现age=20的两条记录的like字段已经从“篮球”更改为“网球”
2,直接使用updateByQuery接口更新
从官网update_by_query可以看到,需要使用脚本来更新,并且要求的格式为:
{
"script": {
"source": "ctx._source.count++",
"lang": "painless"
},
"query": {
"term": {
"user.id": "kimchy"
}
}
}
所以我们也要按这个格式来构造更新语句,在java代码里,我们可以使用XContentBuilder来构造json格式的对象
@Test
public void testUpdateByQuery2() throws Exception {
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery().filter(QueryBuilders.termQuery("age", 20));
XContentBuilder xContentBuilder = jsonBuilder()
.startObject()
.field("query", queryBuilder)
.startObject("script")
.field("lang","painless")
.field("source","ctx._source.like='网球'")
.endObject().endObject()
;
xContentBuilder.flush();
final String payload = ((ByteArrayOutputStream) xContentBuilder.getOutputStream()).toString("UTF-8");
System.out.println(payload);
}
输出的内容为:
{
"query":{
"bool":{
"filter":[
{
"term":{
"age":{
"value":20,
"boost":1
}
}
}
],
"adjust_pure_negative":true,
"boost":1
}
},
"script":{
"lang":"painless",
"source":"ctx._source.like='网球'"
}
}
可以看到通过这种方式构造的json数据,符合update_by_query的格式,接下来就使用updateByQuery的api来更新:
@Test
public void testUpdateByQuery2() throws Exception {
HttpClientConfig.Builder builder = new HttpClientConfig.Builder("http://localhost:9200/");
JestClientFactory jestClientFactory = new JestClientFactory();
jestClientFactory.setHttpClientConfig(builder.build());
// 获取到jestClient对象
JestClient jestClient = jestClientFactory.getObject();
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery().filter(QueryBuilders.termQuery("age", 20));
XContentBuilder xContentBuilder = jsonBuilder()
.startObject()
.field("query", queryBuilder)
.startObject("script")
.field("lang","painless")
.field("source","ctx._source.like='网球1'")
.endObject().endObject()
;
xContentBuilder.flush();
final String payload = ((ByteArrayOutputStream) xContentBuilder.getOutputStream()).toString("UTF-8");
System.out.println(payload);
UpdateByQuery.Builder updateByQuery = new UpdateByQuery.Builder(payload)
.addIndex("user")
.addType("_doc")
.setParameter("conflicts","proceed")
.setParameter("wait_for_completion",false);
jestClient.execute(updateByQuery.build());
}
再次查询该索引数据,发现age=20的记录like字段已经成功修改为"网球1"
{
"hits": {
"total": 4,
"max_score": 1.0,
"hits": [
{
"_index": "user",
"_type": "_doc",
"_id": "MV2uAXgB4vfr7WLWaK3Y",
"_score": 1.0,
"_source": {
"uid": 4,
"like": "网球1",
"age": 20,
"username": "小红"
}
},
{
"_index": "user",
"_type": "_doc",
"_id": "MF2uAXgB4vfr7WLWK63s",
"_score": 1.0,
"_source": {
"uid": 3,
"like": "网球1",
"age": 20,
"username": "小明"
}
},
{
"_index": "user",
"_type": "_doc",
"_id": "Ll2tAXgB4vfr7WLWcq1P",
"_score": 1.0,
"_source": {
"uid": 1,
"username": "张三",
"age": 18,
"like": "乒乓球"
}
},
{
"_index": "user",
"_type": "_doc",
"_id": "L12tAXgB4vfr7WLW162C",
"_score": 1.0,
"_source": {
"uid": 2,
"username": "李四",
"age": 19,
"like": "羽毛球"
}
}
]
}
}
第二种方式中使用到了脚本,能够更新成功,但是有些瑕疵,如更新的字段值是写死在脚本里的,不具备通用性,接下来对这个方式进行改造优化~