ES search 全表

转载

bugouhen 2024-11-14 12:40:41

文章标签 ES search 全表字段数据 API 文章分类 架构后端开发

GET API是Elasticsearch中常用的操作，一般用于验证文档是否存在；或者执行CURD中的文档查询。与检索不同的是，GET查询是实时查询，可以实时查询到索引结果。而检索则是需要经过处理才能搜索到。合理利用这些方法，可以更灵活的使用Elasticsearch。

查询样例

Get API允许基于ID字段从Elasticsearch查询JSON文档，下面就是一个查询的例子：

curl -XGET 'http://localhost:9200/website/blog/123?pretty'

上面的命令表示，在website索引的blog类型中查询id为123的文档，返回结果如下：

{• "_index": "website",
• "_type": "blog",
• "_id": "123",
• "_version": 1,
• "found": true,
• "_source": { 
    
• "title": "My first blog entry",
• "text": "Just trying this out...",
• "date": "2014/01/01"
• }
}

上面返回的数据包括文档的基本内容，

_index是索引名称_type是类型_id是ID_version是版本号_source字段包括了文档的基本内容found字段代表是否找到

这个API支持使用HEAD方式提交，这样可以验证这个ID是否存在，而不会返回无用的数据。

curl -XHEAD -i 'http://localhost:9200/website/blog/123'
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

实时

默认情况下get API是实时的，并不会受到索引的刷新频率的影响。（也就是说，只要索引的数据，就可以立马查询到）

有的时候我们可能想要关闭实时查询，这样可以设置realtime=false。

也可以在配置文件中配置，使之全局可用，即配置action.get.realtime为false。

类型可选

_all，从而匹配所有的类型。

source过滤

fields字段或者禁用了_source字段。通过设置_source属性，可以禁止返回source内容（source内容为空）:

curl -XGET 'http://localhost:9200/website/blog/123?_source=false'
{

• "_index": "website",
• "_type": "blog",
• "_id": "123",
• "_version": 1,
• "found": true,
• "_source": { }

} _source_include(包含)或者_source_exclude（排除）进行过滤。可以使用逗号分隔来设置多种匹配模式，比如：

curl -XGET 'http://localhost:9200/website/blog/123?_source_include=title,date'
curl -XGET 'http://localhost:9200/website/blog/123?_source_exclude=date'

curl -XGET 'http://localhost:9200/website/blog/123?_source_include=*&_source_exclude=date'

{

• "_index": "website",
• "_type": "blog",
• "_id": "123",
• "_version": 1,
• "found": true,
• "_source": { 
    
• "text": "Just trying this out...",
• "title": "My first blog entry"
• }
}

字段

get操作允许设置fields字段，返回特定的字段：

curl -XGET 'http://localhost:9200/website/blog/123?fields=title,text'
{• "_index": "website",
• "_type": "blog",
• "_id": "123",
• "_version": 1,
• "found": true,
• "fields": { 
    
• "title": [ 
      
• "My first blog entry"
• ],
• "text": [ 
      
• "Just trying this out..."
• ]
• }
}

如果请求的字段没有被存储，那么他们会从source中分析出来，这个功能也可以用source_filter来替代。

元数据比如_routing和_parent是永远不会被返回的。

Generated fields

ignore_erros_on_generated_fields=true来忽略错误。

Translog就是索引的数据要进行存储，总不可能索引一条就更新一次Lucene结构。于是就搞了个translog，数据的变动会先放在translog里面，再刷新到es中。实时查询，其实是读取了translog中，还未持久化的数据。

仅返回_source

使用/{index}/{type}/{id}/_source可以仅仅返回_source字段，而不必返回过多不必要的信息，浪费网络带宽。

curl -XGET 'http://localhost:9200/website/blog/123/_source'
{

• "title": "My first blog entry",
• "text": "Just trying this out...",
• "date": "2014/01/01"
}

也可以使用过滤机制：

curl -XGET 'http://localhost:9200/website/blog/123/_source?_source_include=title,text,date'
{

• "date": "2014/01/01",
• "text": "Just trying this out...",
• "title": "My first blog entry"
}

也是支持使用HEAD方式，验证是否存在：

curl -XHEAD -i 'http://localhost:9200/website/blog/123/_source'
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

路由

当索引的时候指定了路由，那么查询的时候就一定要指定路由。

curl -XGET 'http://localhost:9200/XXX/XXX/XXX?routing=XXX'

如果路由信息不正确，就会查找不到文档

Preference

控制为get请求维护一个分片的索引，这个索引可以设置为：

_primary 这个操作仅仅会在主分片上执行。
_local 这个操作会在本地的分片上执行。
Custom (string) value 用户可以自定义值，对于相同的分片可以设置相同的值。这样可以保证不同的刷新状态下，查询不同的分片。就像sessionid或者用户名一样。

刷新

refresh参数可以让每次get之前都刷新分片，使这个值可以被搜索。设置true的时候，尽量要考虑下性能问题，因为每次刷新都会给系统带来一定的压力

分布式

get操作会通过特定的哈希方法，把请求分配给特定的分片进行查询。由于在分布式的环境下，主分片和备份分片作为一个组，都可以支持get请求。这就意味着，分片的数量越多，get执行的规模就越大。

参考

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：springboot项目怎样显示启动列表

下一篇：iOS 如何查看当前主线执行的任务堆栈

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

ES search 全表

ES search 全表

查询样例

实时

类型可选

source过滤

字段

Generated fields

仅返回_source

路由

Preference

刷新

分布式

参考

51CTO博客