Java实现es批量更新数据 es批量更新某个字段

转载

技术极客侠 2024-06-24 18:25:58

文章标签 Java实现es批量更新数据 elasticsearch 字段数据 文章分类 Java 后端开发

文章目录

版本
创建文档

自动生成唯一_id

7.0 以后
7.0 版本之前

自定义 _id

异常

批量插入

删除文档

批量删除

更新文档

单文档更新

覆盖更新
局部更新

批量更新

总结

新增
更新
删除

其他异常

版本

版本内容基于elasticsearch-7.6.1。部分API可能会和低版本不一致，而且低版本的elasticsearch 支持多个type，7.0 之后已经移除type概念，默认情况下type有且只能有一个： _doc。应该是为了兼容以前的版本，部分关于type的API依旧可以使用，通常还会会给出Deprecation提示。在elasticsearch8.X中已经移除了类型（type）

创建文档

_index、_type（7.x 已固定）、_id三者唯一确定一个具体文档。如同数据库数据一样，数据库库、表、主键值唯一确定一条数据。

对于elasticsearch 因为是分布式服务，没有提供自增主键，故需要我们手动指定主键ID或者 es使用特定算法生成主键

自动生成唯一_id

上面提到过 7.0后仅支持 _doc 这一种type，实际使用时会发现依旧可以创建一个额外的type，但是会有Deprecation提示，不建议这么做，先以创建为例自动生成ID 为例

7.0 以后

POST  person/_doc
{
    "name":"test",
    "age":256,
    "sex":"男"
}

返回结果

{
  "_index" : "person",
  "_type" : "_doc",
  "_id" : "F1YBNH8B6e0WzfSClmq1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

7.0 版本之前

POST  person/doc
{
    "name":"test",
    "age":20,
    "sex":"男"
}

因为现在是elasticsearch-7.6.1 版本，故在Kibana上操作上述语句，则会出现以下提示。但是依旧执行成功了

#! Deprecation: [types removal] Specifying types in search requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id})

{
  "_index" : "person",
  "_type" : "doc",
  "_id" : "FFbwM38B6e0WzfSCzGrE",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

使用搜索语句时也会提示 type 已移除

GET person/doc/_search
#! Deprecation: [types removal] Specifying types in document get requests is deprecated, use the /{index}/_doc/{id} endpoint instead.

并且会发现使用 _doc 确实可以访问到数据

GET person/_doc/FFbwM38B6e0WzfSCzGrE

{
  "_index" : "person",
  "_type" : "_doc",
  "_id" : "FFbwM38B6e0WzfSCzGrE",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "test",
    "age" : 256,
    "sex" : "男"
  }
}

自定义 _id

这三种写法都支持PUT请求方式

第一种写法

POST  person/_create/2
{
    "name":"test",
    "age":256,
    "sex":"男"
}

返回结果

{
  "_index" : "person",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

当文档id不存在时，创建文档
当文档Id存在时，抛出异常

第二种写法

POST person/_doc/1/_create

第三种写法

POST person/_doc/2?op_type=create

第四种写法
该方式， id不存在是创建文档，存在时更新文档

POST  person/_doc/2
{ "name":"test"}

异常

前三种方式调用多次，会抛出以下异常信息

version conflict, document already exists (current version [1])

因为如果ID已存在，再次调用则会报错。状态码409

{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[2]: version conflict, document already exists (current version [1])",
        "index_uuid": "rkGDBwl6SCuWSX2UbrVm7Q",
        "shard": "0",
        "index": "person"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[2]: version conflict, document already exists (current version [1])",
    "index_uuid": "rkGDBwl6SCuWSX2UbrVm7Q",
    "shard": "0",
    "index": "person"
  },
  "status": 409
}

[2]: version conflict, document already exists (current version [1])

【2】中的就是自定义ID

批量插入

批量操作语法

POST _bulk
{"actionName":{"_index":"indexName","_type":"_doc", "_id":"id"}}
{"field1":"value1","field2":"value2"}

actionName 为操作类型：

create：创建文档。文档id已存在会冲突抛出异常
index：替换文档（创建|更新）。id已存在会替换
delete：删除文档
update: 局部更新

批量操作可以同时操作多个不同的索引。如果是操作某个固定的索引，可以将索引添加中url中，同时在请求体中可以省略这一部分。

第一种写法

POST person/_bulk
{"index":{"_id":"6"}}
{"name":"a","age":25,"sex":"男"}
{"index":{"_id":"6"}}
{"name":"b","age":25,"sex":"女"}

id为6的数据执行了两次，两次执行都成功。且最后的结果为最后一条数据。

"_source" : {
    "name" : "b",
    "age" : 25,
    "sex" : "女"
  }

第二种写法

POST person/_bulk
{"create":{"_id":"7"}}
{"name":"a","age":25,"sex":"男"}
{"create":{"_id":"7"}}
{"name":"b","age":25,"sex":"女"}

id为7 的执行了两此，第一条指定成功，第二条执行失败

{
  "took" : 5,
  "errors" : true,
  "items" : [
    {
      "create" : {
        "_index" : "person",
        "_type" : "_doc",
        "_id" : "7",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 12,
        "_primary_term" : 3,
        "status" : 201
      }
    },
    {
      "create" : {
        "_index" : "person",
        "_type" : "_doc",
        "_id" : "7",
        "status" : 409,
        "error" : {
          "type" : "version_conflict_engine_exception",
          "reason" : "[7]: version conflict, document already exists (current version [3])",
          "index_uuid" : "QN2oJb_FSF-2ePQesU11aQ",
          "shard" : "0",
          "index" : "person"
        }
      }
    }
  ]
}

删除文档

DELETE  person/_doc/1

同样的7.0后的版本，如果指定了其他 type类型。删除会存在提示

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the /{index}/_doc/{id} endpoint instead.

批量删除

第一种写法

POST person/_bulk
{"delete":{"_id":1}}
{"delete":{"_id":2}}
{"delete":{"_id":3}}

或

POST _bulk
{"delete":{"_index":"person","_id":1}}
{"delete":{"_index":"person","_id":2}}
{"delete":{"_index":"person","_id":3}}

第二种写法

批量删除符合特定查询条件的文档

POST person/_delete_by_query
{
  "query":{
    "term": {
      "age": {
        "value": 256
      }
    }
  }
}

更新文档

单文档更新

覆盖更新

POST person/_doc/2
{
    "name":"test2",
    "age":28,
    "sex":"男",
    "address":"山东"
}

更新操作结果

{
  "_index" : "person",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 6,
  "_primary_term" : 1
}

该操作实际的功能是upsert, id不存在时新增，id存在时更新。可以通过返回的数据判断是何种操作，

“result”: “updated” , 更新操作
“result”: “created”，新增操作

局部更新

如果文档包含数据特别多，信息量比较大，仅仅只是更改某个字段，却需要把整个文档数据传输过去，无疑是不合理的。比如更新user 表的age字段，却要把user的全部字段信息都传递，造成了无意义的带宽浪费。

当然这个更新不仅仅是对已有字段的更新，还可以添加之前不存在的字段
语法一
更新文档2 的sex为女

POST person/_update/2
{
  "doc":{
    "sex" :"女"
  }
}

语法二
添加一个手机号字段

POST /person/doc/2/_update

{
    "doc": {
        "phone": "12345678901"
    }
}

#! Deprecation: [types removal] Specifying types in document update requests is deprecated, use the endpoint /{index}/_update/{id} instead.

当使用其他类型时，可以看到这种方式已经不推荐

经过上述的操作，现在文档结果为：

{
  "_index" : "person",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 4,
  "_seq_no" : 8,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "test2",
    "age" : 28,
    "sex" : "女",
    "phone" : "11111111"
  }
}

批量更新

相对的批量更新也分为局部和全局更新两种。

一般来说局部更新用的更多一些，比如新增字段， city

批量覆盖更新

POST person/_bulk
{"index":{"_id":"6"}}
{"name":"a","age":25,"sex":"男", "city":"临沂"}

批量局部更新

POST person/_bulk
{"update":{"_id":"6"}}
{"doc":{"city":"临沂"}}

总结

在 7.0以后的版本，推荐使用以下API

新增

自动生成ID
根据REST定义PUT请求是幂等操作。如下API每次调用都会生成新的文档，故必须是POST请求

POST user/_doc

自定义ID
POST /PUT 都可以

POST user/_doc/1

POST  user/_create/2

POST  user/_doc/3/_create

批量插入

POST person/_bulk
{"index":{"_id":"6"}}
{"name":"a","age":25,"sex":"男"}

或者如下

POST _bulk
{"index":{"_index":"person","_id":"6"}}
{"name":"a","age":25,"sex":"男"}

更新

覆盖更新

POST user/_doc/1

局部更新

POST  user/_update/1
{
	"doc":{
		"filed_name" :"filed_value"
	}
}

批量更新

POST person/_bulk
{"index":{"_id":"6"}}
{"name":"a","age":25,"sex":"男"}

批量局部更新

POST person/_bulk
{"update":{"_id":"6"}}
{"doc":{"sex":"女"}}

删除

删除单个文档

DELETE  user/_doc/1

批量删除

POST user/_delete_by_query

POST _bulk
{"delete":{"_index":"person","_id":1}}

其他异常

mapper_parsing_exception： failed to parse field [age] of type [long] in document with id ‘2’. Preview of field’s value: ‘q1’

当创建索引时，如果未指定字段类型。那么Elasticsearch为对字段类型进行猜测，动态生成了字段和类型的映射关系。

GET   person/_mapping

{
  "person" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "phone" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "sex" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

可以看到 age 类型为long 类型。当我试图把age修改为非数字的字符串时，则会报错。 es会先尝试把字符串解析成数字

POST person/_update/2
{
  "doc":{
    "age":"q1"
  }
}

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [age] of type [long] in document with id '2'. Preview of field's value: 'q1'"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse field [age] of type [long] in document with id '2'. Preview of field's value: 'q1'",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "For input string: \"q1\""
    }
  },
  "status": 400
}

参看文档： elasticsearch 官方文档

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。