13.1 元数据概述

mapping元字段是mapping映射中描述文档本身的字段,大致可以分为文档属性元数据、文档元数据、索引元数据、路由元数据和自定义元数据。

分类

元数据

说明

文档属性元数据

_index

文档所属的索引

_id

文档的id

_type

文档所属类型

_uid

由_type和_id字段组成

文档元数据

_source

文档的原生json字符串

_size

整个_source字段的字节数大小

索引元数据

_all

自动组合所有的字段值

_field_names

索引了每个字段的名称

路由元数据

_parent

指定文档之间父子关系,已过时

_routing

将一个文档根据路由存储到指定分片上

自定义元数据

_meta

用于自定义元数据

下面对重要的元字段做进一步解读。

13.2 _index


When performing queries across multiple indexes, it is sometimes desirable to add query clauses that are associated with documents of only certain indexes. The _index field allows matching on the index a document was indexed into. Its value is accessible in term, or terms queries, aggregations, scripts, and when sorting:
多索引查询时,有时候只需要在特地索引名上进行查询,_index字段提供了便利,也就是说可以对索引名进行term查询、terms查询、聚合分析、使用脚本和排序。
The _index is exposed as a virtual field — it is not added to the Lucene index as a real field. This means that you can use the _index field in a term or terms query (or any query that is rewritten to a term query, such as the match, query_string or simple_query_string query), but it does not support prefix, wildcard, regexp, or fuzzy queries.
_index是一个虚拟字段,不会真的加到Lucene索引中,对_index进行term、terms查询(也包括match、query_string、simple_query_string),但是不支持prefix、wildcard、regexp和fuzzy查询。

13.3 _type


在6.0.0中弃用。
此doc的mapping type名, 自动被索引,可被查询,聚合,排序使用,或者脚本里访问

13.4 _id


doc的id,建索引时候传入 ,不被索引, 可通过_uid被查询,脚本里使用,不能参与聚合或排序

PUT my_index

PUT my_index/my_type/1
{
  "text": "Document with ID 1"
}

PUT my_index/my_type/2&refresh=true
{
  "text": "Document with ID 2"
}
GET my_index/_search
{
  "query": {
    "terms": {
      "_id": [ "1", "2" ]
    }
  }
}

检索结果

{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "text": "Document with ID 1"
        }
      }
    ]
  }
}

This was not the case with pre-6.0 indices due to the fact that they supported multiple types, so the _type and _id were merged into a composite primary key called _uid.
6.0之前的版本并不是这样的,因为它们支持多种类型,所以_type和_id被合并为一个名为_uid的复合主键。

13.5 _uid

在6.0.0中弃用。现在,类型已被删除,文档由_id唯一标识,_uid字段仅作为查看_id字段以保持向后兼容。

13.6 _source

The _source field contains the original JSON document body that was passed at index time. The _source field itself is not indexed (and thus is not searchable), but it is stored so that it can be returned when executing fetch requests, like get or search.
_source字段包含在索引时间传递的原始JSON文档正文。 _source字段本身没有编入索引(因此不可搜索),但它被存储,以便在执行获取请求(如get或search)时可以返回它。
默认_source字段是开启的,也就是说,默认情况下存储文档的原始值。

如果某个字段内容非常多(比如一篇小说),或者查询业务只需要对该字段进行搜索,返回文档id,然后通过其他途径查看文档原文,则不需要保留_source元字段。可以通过禁用_source元字段,在ElasticSearch 中只存储倒排索引,不保留字段原始值。
【例子】_source禁用

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "_source": {
        "enabled": false
      }
    }
  }
}
PUT my_index/my_type/1
{
  "text": "This is a document"
}

GET my_index/my_type/1

查询结果中没有_source相关数据

{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "1",
  "_version": 1,
  "found": true
}

【例子】包含或排除部分字段

DELETE my_index

PUT my_index
{
  "mappings": {
    "blog":{
      "_source":{
        "includes":["title","url"],
        "excludes":["content"]
      },
      "properties": {
        "title":{
          "type":"text"
        },
        "content":{
          "type":"text"
        },
        "url":{
          "type":"text"
        }
      }
    }  
  }
}

PUT my_index/blog/1
{
  "title":"yum源",
  "content":"CentOS更换国内yum源",
  "url":"http://url.cn/53788351"
}

PUT my_index/blog/2
{
  "title":"Ambari",
  "content":"CentOS7.x下的Ambari2.4源码编译",
  "url":"http://url.cn/53844169"
}

GET my_index/blog/1
{
  "_index": "my_index",
  "_type": "blog",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "title": "yum源",
    "url": "http://url.cn/53788351"
  }
}

13.7 _size

整个_source字段的字节数大小 。
The mapper-size plugin provides the _size meta field which, when enabled, indexes the size in bytes of the original _source field.
需要安装插件,执行命令bin/elasticsearch-plugin install mapper-size

[es@node1 ~]$ cd /opt/elasticsearch-6.1.1/
[es@node1 elasticsearch-6.1.1]$ bin/elasticsearch-plugin install mapper-size
-> Downloading mapper-size from elastic
[=================================================] 100%   
-> Installed mapper-size
[es@node1 elasticsearch-6.1.1]$

然后重启elasticsearch,mapper-size插件才能生效。

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "_size": {
        "enabled": true
      }
    }
  }
}
PUT my_index/my_type/1
{
  "text": "This is a document"
}

PUT my_index/my_type/2
{
  "text": "This is another document"
}

查询文档时,可以通过_size元字段进行过滤

GET my_index/_search
{
  "query": {
    "range": {
      "_size": { 
        "gt": 10
      }
    }
  }
}
{
  "took": 148,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "text": "This is another document"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "text": "This is a document"
        }
      }
    ]
  }
}

补充:可以通过命令bin/elasticsearch-plugin remove mapper-size删除mapper-size插件。

13.8 _all

Deprecated in 6.0.0.(在6.0.0中弃用)
_all may no longer be enabled for indices created in 6.0+, use a custom field and the mapping copy_to parameter
_all可能不再为在6.0+中创建的索引启用,请使用自定义字段和映射copy_to参数,请参见《14.6 copy-to》小节相关内容。

_all字段是把其它字段拼接在一起的超级字段,所有的字段用空格分开,_all字段会被解析和索引,但是不存储。当你只想返回包含某个关键字的文档但是不明确地搜某个字段的时候就需要使用_all字段。

按照官方文档的说法,_all元字段默认是禁用的,如果需要使用,可以通过"_all": {"enabled": true}开启,测试如下。

PUT myindex
{
  "mappings": {
    "mytype": {
      "_all": {"enabled": true},
      "properties": {
        "title": { 
          "type": "text"
        },
        "content": { 
          "type": "text"
        }
      }
    }
  }
}

但是创建失败,报错:”Enabling [_all] is disabled in 6.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.”

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Failed to parse mapping [mytype]: Enabling [_all] is disabled in 6.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field."
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping [mytype]: Enabling [_all] is disabled in 6.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Enabling [_all] is disabled in 6.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field."
    }
  },
  "status": 400
}

看来在Elasticsearch 6.1中_all元字段是真的不能使用了。

13.9 _field_names

The _field_names field indexes the names of every field in a document that contains any value other than null. This field is used by the exists query to find documents that either have or don’t have any non-null value for a particular field.
_field_names字段索引文档中每个字段的名称,其中包含除null以外的任何值。 存在查询使用此字段来查找对于特定字段具有或不具有任何非空值的文档。

PUT my_index

PUT my_index/my_type/1
{
  "title": "This is a document"
}

PUT my_index/my_type/2?refresh=true
{
  "title": "This is another document",
  "body": "This document has a body"
}
GET my_index/_search
{
  "query": {
    "terms": {
      "_field_names": ["body"]
    }
  }
}

存在问题:应该返回第二条文档,却返回空。暂时没有找到问题产生的原因。

{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

13.10 _routing

使用以下公式将文档路由到索引中的特定分片:(A document is routed to a particular shard in an index using the following formula:)

shard_num = hash(_routing) % num_primary_shards

The default value used for _routing is the document’s _id.
_routing的默认值是文档的_id。

自定义路由模式可以通过指定每个文档的自定义路由值来实现。

PUT my_index/my_type/1?routing=user1&refresh=true 
{
  "title": "This is a document"
}
GET my_index/my_type/1?routing=user1
{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "1",
  "_version": 1,
  "_routing": "user1",
  "found": true,
  "_source": {
    "title": "This is a document"
  }
}

查询中可以使用_routing字段的值:

GET my_index/_search
{
  "query": {
    "terms": {
      "_routing": [ "user1" ] 
    }
  }
}
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_routing": "user1",
        "_source": {
          "title": "This is a document"
        }
      }
    ]
  }
}