前言
前面介绍过doc_values,主要作用是为了更好的支持排序,聚合,脚本等需求,以面向列的方式存储,对于排序和聚合来说更高效,不过对于text字段,doc_values是不支持的。
fielddata
对于上述问题,有一种替代方案就是使用fielddata,这是一种把文本字段放到内存中来处理的方式,先直接从磁盘读取每个段的反向索引,然后通过反向索引,反转索引与文档的关系,最后将结果放到JVM堆内存中来处理。
注意:由于fielddata的机制,会占用大量堆空间,因此可能会造成频繁的FullGC,导致用户遇到延迟、卡顿等现象,这也是为什么fielddata默认为不开启的原因。
案例演示
首先,先建立一个emp索引
PUT /emp/
{
"mappings": {
"properties": {
"name":{
"type": "text"
},
"age":{
"type": "integer"
}
}
}
}
插入一条数据
PUT /emp/_doc/1
{
"name":"zhang san",
"age":18
}
尝试对age进行一次聚合查询
GET /emp/_search
{
"aggs": {
"age_bucket": {
"terms": {
"field": "age"
}
}
}
}
OK,没问题
再尝试对name进行一次聚合查询
GET /emp/_search
{
"aggs": {
"name_bucket": {
"terms": {
"field": "name"
}
}
}
}
报错了
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [name] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "emp",
"node" : "ev2pyH4yRBGAVpXTGrXUzg",
"reason" : {
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [name] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
}
],
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [name] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [name] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
}
},
"status" : 400
}
显然,不能对text字段进行聚合处理,现在我们尝试加上fielddata再试试。
重新建立一个索引emp2
PUT /emp2/
{
"mappings": {
"properties": {
"name":{
"type": "text",
"fielddata": true
},
"age":{
"type": "integer"
}
}
}
}
插入数据
PUT /emp2/_doc/1
{
"name":"zhang san",
"age":18
}
聚合查询
GET /emp2/_search
{
"aggs": {
"name_bucket": {
"terms": {
"field": "name"
}
}
}
}
这次可以了
fielddata的替代方案
虽然现在已经可以对name进行聚合查询了,但是前面已经分析过了,由于启用fielddata会造成JVM堆内存异常,所以这并不是一个明智的选择,那么还有什么可替代的方案呢?
其实答案在前面的报错中,就已经给出了
我们可以使用keyword来实现,name用于全文搜索,而name.keywork用于聚合等查询。
就像如下案例这样:
PUT /emp3/
{
"mappings": {
"properties": {
"name":{
"type": "text",
"fields": {
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"age":{
"type": "integer"
}
}
}
}
PUT /emp3/_doc/1
{
"name":"zhang san",
"age":18
}
GET /emp3/_search
{
"aggs": {
"name_bucket": {
"terms": {
"field": "name.keyword"
}
}
}
}
当然keyword不会分词,不过你要好好考虑的是,为什么你会对文本字段分词后再进行聚合、排序或者在脚本中使用,当你仔细分析后你会发现这样做通常是没有意义的。