es 批量插入数据报错内存不足 es批量更新数据失败

转载

flyingsmiling 2024-03-27 08:36:35

文章标签 es 批量插入数据报错内存不足 java elasticsearch updateRequest 字段 文章分类 架构后端开发

引言

上周把ES搜索服务搭建好了，这两天在业务系统上对接该服务，遇到了The number of object passed must be even but was [1]这样一个问题，下面记录一下解决的过程。

背景

依据系统需求，我们会将现有系统中所有的用户数据全量同步一次到ES，后面用户在系统中进行信息的更新会增量同步至ES，增量同步代码是用的单个新增/修改文档的方法，具体代码可参见《Rest Client方式集成Spring Boot应用》。

局部更新引入

因为增量更新时，只更新个别字段，所以我传递的实体仅有更新字段和id唯一标识字段，调用上述方法之后，es操作是全局更新，也就是原来id下面的属性会被这次传递的实体属性替代。这样一来，就需要单独写一个局部更新的方法。

局部更新代码V1

/**
 * 根据id更新索引
 *
 * @param indexName
 * @param type
 * @return
 */
public boolean updateById(String indexName, String type, String id, String json) {
    UpdateRequest updateRequest = new UpdateRequest();
    //指定索引name、type和id
    updateRequest.index(indexName).type(type).id(id);
    //指定更新的字段，json格式
    updateRequest.doc(json);
    //如果要更新的文档在更新操作的get和索引阶段之间被另一个操作更改，那么要重试多少次更新操作
    updateRequest.retryOnConflict(3);
    updateRequest.fetchSource(true);
    try {
        UpdateResponse response = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
        log.info("[EsClientConfig.updateById] [end] [update index by id result is {}]", JSON.toJSONString(response));
    } catch (IOException e) {
        log.error("[EsClientConfig.updateById] [error] [fail to update index,indexName is {},type is {},doc is {}]", indexName, type, map);
        return false;
    } catch (ElasticsearchException e) {
        if (e.status() == RestStatus.NOT_FOUND) {
            return false;
        }
    }
    return true;
}

上面代码运行后，就出现了标题中的错误，并且报错是在updateRequest.doc(json)这一行代码上。问题原因在于高版本的ES默认map格式，源码中会有一段校验，校验不通过所以抛出异常：

es 批量插入数据报错内存不足 es批量更新数据失败_字段

知道了问题原因，那么解决方案就很容易了，我们可以改为传map格式，也可以使用提供的XContentType，指定为JSON格式。

局部更新代码V2

局部更新单个指定文档的方法代码如下：

/**
 * 根据id更新索引
 *
 * @param indexName
 * @param type
 * @return
 */
public boolean updateById(String indexName, String type, String id, Map<String, Object> map) {
    UpdateRequest updateRequest = new UpdateRequest();
    //指定索引name、type和id
    updateRequest.index(indexName).type(type).id(id);
    //指定更新的字段，map格式
    updateRequest.doc(map);
    //或者指定更新的字段，json格式传递，同局部更新代码V1版，加上XContentType.JSON即可
    //updateRequest.doc(JSON.toJSONString(map),XContentType.JSON);
    //如果要更新的文档在更新操作的get和索引阶段之间被另一个操作更改，那么要重试多少次更新操作
    updateRequest.retryOnConflict(3);
    updateRequest.fetchSource(true);
    try {
        UpdateResponse response = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
        log.info("[EsClientConfig.updateById] [end] [update index by id result is {}]", JSON.toJSONString(response));
    } catch (IOException e) {
        log.error("[EsClientConfig.updateById] [error] [fail to update index,indexName is {},type is {},doc is {}]", indexName, type, map);
        return false;
    } catch (ElasticsearchException e) {
        if (e.status() == RestStatus.NOT_FOUND) {
            return false;
        }
    }
    return true;
}

根据id批量更新文档方法代码如下：

/**
 * 批量更新索引数据（不存在会根据参数list创建文档，map中需要有唯一标识id键值对）
 *
 * @param indexName
 * @param type
 * @param list
 * @return
 */
public void bulkUpdate(String indexName, String type, List<Map<String, Object>> list) {
    BulkRequest bulkRequest = new BulkRequest();
    for (Map<String, Object> map : list) {
        UpdateRequest updateRequest = new UpdateRequest();
        //指定索引name、type和id
        updateRequest.index(indexName).type(type).id(map.get("id").toString());
        updateRequest.doc(map);
        //如果不存在，则创建
        updateRequest.upsert(map);
        //如果要更新的文档在更新操作的get和索引阶段之间被另一个操作更改，那么要重试多少次更新操作
        updateRequest.retryOnConflict(3);
        updateRequest.fetchSource(true);
        //返回字段不包含的属性
        String[] excludes = new String[]{"id"};
        updateRequest.fetchSource(new FetchSourceContext(true, null, excludes));
        bulkRequest.add(updateRequest);
    }
    try {
        BulkResponse response = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        log.info("[EsClientConfig.bulkUpdate] [end] [bulk update index result is {}]", JSON.toJSONString(response));
    } catch (IOException e) {
        log.error("[EsClientConfig.bulkUpdate] [error] [fail to bulk update index,indexName is {},type is {},doc is {}]", indexName, type, list);
    }
}

UpdateRequest详解

下面是UpdateRequest对象的核心类图：

es 批量插入数据报错内存不足 es批量更新数据失败_java_02

核心属性	属性说明
shardId	指定需要执行的分片信息
index	索引库
type	类型名
id	文档ID
routing	分片值，默认为id的值，elasticsearch的分片路由算法为( hashcode(routing) % primary_sharding_count(主分片个数) )
script	通过脚步更新文档
fields	指定更新操作后，需要返回的文档的字段信息，默认为不返回，已废弃，被fetchSourceContext取代
fetchSourceContext	执行更新操作后，如果命中，需要返回_source的上下文配置,支持通配符表达式来匹配字段名
version	版本号
versionType	版本类型，分为内部版本、外部版本，默认为内部版本
retryOnConflict	Elasticsearch基于版本进行乐观锁控制，当版本冲突后，允许的重试次数，超过重试次数retry_on_conflict后抛出异常
refreshPolicy	刷新策略。NONE：代表不重试
upsertRequest	使用该字段进行更新操作，如果原索引不存在，则创建，类似于saveOrUpdate操作
scriptedUpsert	是否是用脚步执行更新操作
docAsUpsert	是否使用saveOrUpdate模式，即是否使用IndexRequest upsertRequest进行更新操作
detectNoop	detectNoop=true的情况下，数据不进行改变，返回的结果result为noop，_shards各个字段都返回0，表示没有在任何分片上执行该动作，并且数据的版本_version并不会发送变化；detectNoop=false的情况下，数据不进行改版，result=updated，表示执行的动作为更新，并且版本号自增1，_shards反馈的是各分片的执行情况
doc	默认使用该请求进行更新操作，更新基本有3种方式，script、upsert、doc(普通更新)