match与term的区别
- term
- match
- match_phrase
term
先看看 term 的定义,term 是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词拆解。
这里通过例子来说明,先存放一些数据:
{
"title": "love China",
"content": "people very love China",
"tags": ["China", "love"]
}
{
"title": "love HuBei",
"content": "people very love HuBei",
"tags": ["HuBei", "love"]
}
来使用 term
{
"query": {
"term": {
"title": "love"
}
}
}
结果是两条数据都能查到:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.18232156,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"title": "love HuBei",
"content": "people very love HuBei",
"tags": [
"HuBei",
"love"
]
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"title": "love China",
"content": "people very love China",
"tags": [
"China",
"love"
]
}
}
]
}
}
发现,title里有关love的关键字都查出来了,但是我只想精确匹配 love China 这个,按照下面的写法看看能不能查出来:
{
"query": {
"term": {
"title": "love China"
}
}
}
执行发现无数据:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
从概念上看,term属于精确匹配,只能查单个词。我想用term匹配多个词怎么做?可以使用 terms 来:
{
"query": {
"terms": {
"title": ["love", "China"]
}
}
}
查询结果为:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"title": "love HuBei",
"content": "people very love HuBei",
"tags": [
"HuBei",
"love"
]
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"title": "love China",
"content": "people very love China",
"tags": [
"China",
"love"
]
}
}
]
}
}
发现全部查询出来,为什么?因为 terms 里的 [ ]
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "love"
}
},
{
"term": {
"title": "china"
}
}
]
}
}
}
可以看到,我们上面使用的china是小写的。当使用大写的China进行搜索的时候,发现所搜不到任何信息。这是为什么呢?title这个词在进行存储的时候,进行了分词处理。我们这里使用的是默认的分词器进行了分词处理。我们可以看看分词器是如何进行分词的:
GET test/_analyze
{
"text" : "love China"
}
结果为:
{
"tokens": [
{
"token": "love",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "china",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
分析出来的为 love 和 china 的两个词。而 term 只能完完整整的匹配上面的词,不做任何改变的匹配。所以,我们使用 China 这样的方式进行查询的时候,就会失败。
match
先用 love China 来匹配。
{
"query": {
"match": {
"title": "love China"
}
}
}
结果是:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.3862944,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "7",
"_score": 1.3862944,
"_source": {
"title": "love China",
"content": "people very love China",
"tags": [
"China",
"love"
]
}
},
{
"_index": "test",
"_type": "doc",
"_id": "8",
"_score": 0.6931472,
"_source": {
"title": "love HuBei",
"content": "people very love HuBei",
"tags": [
"HuBei",
"love"
]
}
}
]
}
}
发现两个都查出来了,为什么?因为 match 进行搜索的时候,会先进行分词,分词后再来匹配,上面两个内容,他们 title 的词条为: love china hubei,我们搜索为 love China ,分词处理之后得到为 love china,并且属于或的关系,只要任何一个词条在里面就能匹配到。如果想 love 和 China 同时匹配的话怎么做?使用 match_phrase
match_phrase
match_phrase 称为短语搜索,要求所有的分词必须同时出现在文档中,同时位置必须紧邻一致。
{
"query": {
"match_phrase": {
"title": "love china"
}
}
}
结果为:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.3862944,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "7",
"_score": 1.3862944,
"_source": {
"title": "love China",
"content": "people very love China",
"tags": [
"China",
"love"
]
}
}
]
}
}