es 范围查询指定时间格式 es按时间范围查询

转载

梦断蓝桥魂 2024-07-31 20:00:03

文章标签 es 范围查询指定时间格式 ES ElasticSearch elasticsearch 搜索引擎 文章分类 架构后端开发

ElasticSearch

如果觉得对你有帮助，能否点个赞或关个注，以示鼓励笔者呢？！

基本概念

ElasticSearch 是什么?

搜索与过滤

搜索与过滤的区别

全文检索 (full text queries)

intervals query
match
match_bool_prefix query
match_phrase query
match_phrase_prefix query
multi_match query

结构化查询 (term-level queries)

ids
term
terms
fuzzy
range
exists
prefix
wildcard
regexp

时间类转换实践

bool query
boosting query
constant score query
dis max query
function score query

基本概念

ElasticSearch 是什么？

elasticsearch 是分布式的 “搜索引擎” 和 “数据分析引擎”

搜索：站内搜索，信息检索
数据分析：最近 7 天面包商品销量排名前 10 的商家有哪些？最近 1 个月访问量排名前三的新闻板块是哪些？

搜索与过滤

query context

查询需要计算分数

filter context

过滤只有 yes or no, 没有分数

区别

过滤器和查询类似，但是它们在评分机制和搜索行为的性能上有所不同，过滤器不像查询会为特定的词条计算得分，搜索的过滤器只是为 ”文档是否匹配这个查询“，返回简单的 ”是“ 或 ”否“ 的答案
所以过滤器比普通的查询要快

Query and filter context

结构化查询 (term-level queries)

term-level 查询操作的是存储在反向索引（倒排索引）中的准确词根，这些查询通常用于结构化数据，如数字、日期和枚举，而不是全文字段，无需进行分析

ids

id 数组查询

{
  "query": {
    "ids" : {
      "values" : ["25328953660256334", "25328902598785991", "100"]
    }
  }
}

Term

查找包含指定字段中精确匹配查询字符串的文档

{
  "query": {
    "term": {
      "songName.keyword": {
        "value": "7k - Ao Vivo",
        "boost": 1.0
      }
    }
  }
}

精确匹配，只能搜索到 songName = “7k - Ao Vivo”, 多一点，少一点都凉凉

Terms

查询包含指定字段中包含查询词根集合中任意一个精确匹配的文档

{
  "query": {
    "terms": {
      "songName.keyword": [
      	{"value": "7k - Ao Vivo","boost": 1.0},
				{"value": "7k","boost": 1.0}
      ]
    }
  }
}

Fuzzy

模糊查询 Quem Mandou Chamar

{
	"query": {
    "fuzzy": {
      "songName.keyword": {
        "value": "Quem Mondou Chamer", # Mondou 错误，Chamer 错误
        "fuzziness": "AUTO",
        "max_expansions": 50,
        "prefix_length": 0,
        "transpositions": true
      }
    }
  }
}

Range

范围查询, 针对数值，日期，地理位置类型，可以使用范围查询

{
  "query": {
    "range": {
      "age": {
        "gte": 10,
        "lte": 20,
        "boost": 2.0
      }
    }
  }
}

Exists

存在改字段的数据，会查询出存在 songName 字段的所有数据

{
  "query": {
    "exists": {
      	"field": "songName"
    }
  }
}

有几种情况，即使有该字段，也查询不到

字段数据为空数组 [] 或为 null
字段 mapping 设置了 index: false
数据长度超过了字段设置的长度 ignore_above
字段数据格式错误，被置为 ignore_malformed

Prefix

前缀查询, 查找以 7k 为前缀的数据

{
  "query": {
    "prefix": {
      	"songName": "7k"
     }
  }
}

Wildcard

通配符查询

{
  "query": {
    "wildcard": {
      "songName": {
        "value": "7?",
        "boost": 1.0
      }
    }
  }
}

? 匹配任意一个字符
* 匹配 0 或任意个字符，包括空

Regexp

正则查询…

结构化查询

复合查询 (compound queries)

1. Bool Query

布尔查询

布尔过滤器，我们就可以类似理解 SQL 的 where 语句，可以由四种子句组成，子句之间的关系是 AND, 满足所有的子句的要求才是 true ，才是符合条件的查询结果

{
   "bool" : {
      "must" :     [],
      "filter":    [],
      "should" :   [],
      "must_not" : []
   }
}

must 所有子句都必须要匹配，与 AND 且等价，与 score 相关
filter 所有子句都必须匹配，但是属于 filter context, 不属于 query context，所以在 filter 下的查询，是没有分数的，是与 score 不相关的
must_not 所有子句都不能匹配，与 NOT 非等价
should 至少一个子句要匹配，与 OR 等价
boost 参数可以修改特定子句的权重比例，默认是 1.0, 如果是 2.0 的话，就代表该子句的 score 原值的两倍

Example

term 查询歌手为 Luiz De Carvalho 且歌名大概为 De Joelhos 的歌曲

{
	"query": {
		"bool": {
			"must": [
				{"term": {"artistNames.keyword": "Luiz De Carvalho"}},
				{"match": {"songName": "De Joelhos"}}
			]
		}
	}
}

{
    "took": 15,
    "timed_out": false,
    "_shards": {
        "total": 6,
        "successful": 6,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 33,
            "relation": "eq"
        },
        "max_score": 15.077974,
        "hits": [
            {
                "_index": "song",
                "_type": "_doc",
                "_id": "25335347327064482",
                "_score": 15.077974,
                "_source": {
                    "id": 25335347327064482,
                    "externalSongId": "spotify-3SeY1xxKaUR0I46wDmJN1Z",
                    "songName": "De Joelhos",
                    "artistNames": [
                        "Luiz De Carvalho"
                    ],
                    "albumName": "Meus Hinos Queridos, Vol. 3",
                    "albumId": 25335347325421019,
                    "genres": [],
                    "coverUrl": "https://i.scdn.co/image/ab67616d00001e02c7039f96c75d897f62e76331",
                    "duration": 155,
                    "source": "YOUTUBE",
                    "externalId": "youtube-QO4y0G2M0m4",
                    "playUrl": "https://www.youtube.com/watch?v=QO4y0G2M0m4",
                    "lyricIds": [],
                    "areas": [
                        "BR_LP"
                    ],
                    "status": "ONLINE",
                    "dead": false
                }
            },
            {
                "_index": "song",
                "_type": "_doc",
                "_id": "25335347327283747",
                "_score": 13.863211,
                "_source": {
                    "id": 25335347327283747,
                    "externalSongId": "spotify-0wTcgBVW9IbpgeUt1GCWC3",
                    "songName": "De Joelhos (Instrumental)",
                    "artistNames": [
                        "Luiz De Carvalho"
                    ],
                    "albumName": "Meus Hinos Queridos, Vol. 3",
                    "albumId": 25335347325421019,
                    "genres": [],
                    "coverUrl": "https://i.scdn.co/image/ab67616d00001e02c7039f96c75d897f62e76331",
                    "duration": 154,
                    "source": "YOUTUBE",
                    "externalId": "youtube-QO4y0G2M0m4",
                    "playUrl": "https://www.youtube.com/watch?v=QO4y0G2M0m4",
                    "lyricIds": [],
                    "areas": [
                        "BR_LP"
                    ],
                    "status": "ONLINE",
                    "dead": false
                }
            }
            }
            }
        ]
    }
}

2. Boosting Query

我们可以通过布尔查询 must + must_not 先排除掉不想要的文档，再获取匹配的文档，但是有时候我们仅仅是想将匹配某些条件的分数降低，这时候，我们就可以使用 boosting query 去实现

查询逻辑

在 positive 上填写你想要匹配的内容
在 negative 上填写你不想要匹配的内容
在 positive 查询的基础分数上，如果匹配中 negative 的内容，则将该文档分数乘以 negative_boost 值

Example

查询歌手为 Luiz De Carvalho 的歌曲，但是又不太想匹配 Pobre Peregrino, Vol. 3专辑的歌曲

{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "artistNames.keyword": "Luiz De Carvalho"
        }
      },
      "negative": {
        "term": {
          "albumName.keyword": "Pobre Peregrino, Vol. 3"
        }
      },
      "negative_boost": 0.5
    }
  }
}

3. Constant Score Query

因为我们知道 filter context 是跟 score 无关的，但是有时候，我们又想让 filter context 可以有一个分数，加入 score 的计算，那么就可以考虑使用 contant score query ，即可以使用 filter, 有可以有一个分数值

查询逻辑

filter query 下查询是不计入 score 的，所以 constant score query 的能力就是赋予所有匹配文档一个固定的 score

Example
全文匹配歌手为 Luiz De Carvalho 的歌曲，固定分数是 10

{
  "query": {
    "constant_score": {
      "filter": {
        "match": { "artistNames": "Luiz De Carvalho" }
      },
      "boost": 10
    },
    "from": 0,
    "size": 2
  }
}

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 6,
        "successful": 6,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 450,
            "relation": "eq"
        },
        "max_score": 10,
        "hits": [
            {
                "_index": "song",
                "_type": "_doc",
                "_id": "25335347324886908",
                "_score": 10,
                "_source": {
                    "id": 25335347324886908,
                    "externalSongId": "spotify-4DnaSoC7rSXx5Ede2lcyDs",
                    "songName": "Em Teus Braços (Playback)",
                    "artistNames": [
                        "Luiz De Carvalho"
                    ],
                    "albumName": "Glória a Deus",
                    "albumId": 25335347327736112,
                    "genres": [],
                    "coverUrl": "https://i.scdn.co/image/ab67616d00001e02c1f27b64e40d66626140c95c",
                    "duration": 170,
                    "source": "YOUTUBE",
                    "externalId": "youtube-RLqBalnwL1E",
                    "playUrl": "https://www.youtube.com/watch?v=RLqBalnwL1E",
                    "lyricIds": [],
                    "areas": [
                        "BR_LP"
                    ],
                    "status": "ONLINE",
                    "dead": false
                }
            },
            {
                "_index": "song",
                "_type": "_doc",
                "_id": "25335347323947197",
                "_score": 10,
                "_source": {
                    "id": 25335347323947197,
                    "externalSongId": "spotify-75rbWrFOi72F8tRRCZ5HvF",
                    "songName": "Graças a Deus (Playback)",
                    "artistNames": [
                        "Luiz De Carvalho"
                    ],
                    "albumName": "Meu Tributo",
                    "albumId": 25335347324927656,
                    "genres": [],
                    "coverUrl": "https://i.scdn.co/image/ab67616d00001e022b3617ef166aeb92f0e49684",
                    "duration": 227,
                    "source": "YOUTUBE",
                    "externalId": "youtube-UrlLN-mzv_8",
                    "playUrl": "https://www.youtube.com/watch?v=UrlLN-mzv_8",
                    "lyricIds": [],
                    "areas": [
                        "BR_LP"
                    ],
                    "status": "ONLINE",
                    "dead": false
                }
            }
        ]
    }
}

score 都是 10 分

4. Dis Max Query

分离最大化查询，返回一或多子句查询匹配的文档，如果返回的文档与多个查询子句都匹配的情况下，则只将最佳匹配的子句的分数作为查询的评分返回，当然 dis_max query 还是可以通过 tie_breaker 参数处理最佳匹配子句与不同子句的权重

分离最大化查询（Disjunction Max Query）指的是将任何与任一查询匹配的文档作为结果返回，但只将最佳匹配的评分作为查询的评分结果返回 (可以通过 tie_breaker 调整权重)
tie_breaker 参数的作用就是调整最佳匹配子句分数的影响权重

tie_breaker

Tie_breaker 与 boost 修改权重的方式是有些不一样的

取分数最高字段的分数

取所有字段之和

0.0 < n < 1.0

取分数最高字段的分数 + (其他字段 * tie_breaker)

Example

curl -X GET "localhost:9200/test/_search?pretty"
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "java beginner" }},
                { "match": { "content":  "java beginner" }}
            ],
            "tie_breaker": 0.7
        }
    }
}

假设有 3 个文档

# 文档一
title: "i like java",                 # 0.4
content: "i am beginner"              # 0.4

# 文档二
title: "i like python",               # 0.0
content: "i am beginner",             # 0.4

# 文档三
title: "i like it",                   # 0.0
content: "i am java beginner"         # 0.7

tie_breaker = 1 的情况下

文档一: 0.4 + (1 * 0.4) = 0.8
文档二: 0.4 + (1 * 0.0) = 0.4
文档三: 0.7 + (1 * 0.0) = 0.7

tie_breaker = 0.5 的情况下

文档一: 0.4 + (0.5 * 0.4) = 0.6
文档二: 0.4 + (0.5 * 0.0) = 0.4
文档三: 0.7 + (0.5 * 0.0) = 0.7

5. Function Score Query

在 Elasticsearch 中function_score是用于处理文档分值的 DSL，它会在查询结束后对每一个匹配的文档进行一系列的重打分操作，最后以生成的最终分数进行排序

GET /_search
{
  "query": {
    "function_score": {
      "query": { "match_all": {} },
      "boost": "5", 
      "functions": [
        {
          "filter": { "match": { "test": "bar" } },
          "random_score": {}, 
          "weight": 23
        },
        {
          "filter": { "match": { "test": "cat" } },
          "weight": 42
        }
      ],
      "max_boost": 42,
      "score_mode": "max",
      "boost_mode": "multiply",
      "min_score": 42
    }
  }
}

function score query 说白了一个加入分析功能的复杂查询，它提供了一些函数功能去改变匹配的分数

weight

说白了就是权重，通过设置一个数字，文档的分数会乘以该数值，得到最终分数
通常配合 filter 使用，因为过滤器只筛选出复合条件的文档，不计算分数，所以只要满足条件的文档分数都是 1，经过 weight 就可以更改为任意的数值

field_value_factor

对文档的某个字段的值，经过加过处理了，得出一个分数
field 指定字段

注意！必须是数字类型

factor 对字段值进行预处理，乘以指定的数值 (默认为 1)，可以理解为字段分数的权重
modifier 将字段值进行加工，有很多种数学选项

none 不处理
log 对数
square 平方
sqrt 平方根
…

当我们搜索想要的视频时，我们希望点赞数高的可以排名更靠前的位置

{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "一剪梅"
        }
      },
      "field_value_factor": {
        "field": "likes",
        "modifier": "sqrt",
        "factor": 0.1
      },
      "boost_mode": "sum"
    }
  }
}

score = score + sqrt(1 + 0.1 * likes)

random_score
使用 random_score 指定一个 seed 和一个 field ，针对同一个 seed , 会返回固定的排序。可以实现不同的人请求，得到不同的排序结果，而同一个人请求可以得到相同的结果

{
  "query": {"function_score": {
    "query": {"match": {
      "title": "一剪梅"
    }},
    
    "random_score": {
        "seed": 1,
        "field": "uuid"
    },
    "boost_mode": "replace"
  }}
}

decay functions (衰减函数)

衰减函数提供了一个更牛逼的公式，对于一个字段，它有一个理想的值，而字段实际的值越皮哪里这个理想值（无论上下），就越不符合期望。衰减函数适合应用在字段类型为数值，日期和地理位置等具备比较范围的情况

origin 原点：该字段的理想型，匹配中，得满分 (boost: 1.0)
offset 偏移量: 与原点相差在偏移量之内的值也可以得到满分
scale 衰减规模: 当值超出了远点到偏移量这段范围，它得到的分数就开始衰减
decay 衰减值: 该字段可以被接受的值 (默认 0.5)，相当于分界点，具体的效果与衰减的模式相关
DECAY_FUNCTION 衰减模式

linear 线性模式
exp 指数函数
gauss 高斯函数

比如我们要找房子，只想找离公司 3 km 范围内的房子，超过这个范围的，兴趣就逐渐递减，直到离公司超过 10 km 就不再考虑了

{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "公寓"
        }
      },
      "gauss": {
        "location": {
          "origin": { "lat": 40, "lon": 116 },
          "offset": "3km",
          "scale": "7km"
           }
         },
         "boost_mode": "sum"
    }
  }
}

最近的情况就是就在公司旁边，周围 3 km 范围是满意的，周围 10 km 是可接受的距离

script_score

当function_score 提供的多种默认处理分数的方式不能满足需求的时候，我们就可以通过脚本去实现自定义的分数规则

GET /_search
{
  "query": {
    "function_score": {
      "query": {
        "match": { "message": "elasticsearch" }
      },
      "script_score": {
        "script": {
          "source": "Math.log(2 + doc['my-int'].value)"
        }
      }
    }
  }
}

函数分数处理方式

score_mode
score_mode可以选择将各个函数得到的分数进行如何处理

multiply 分数相乘
sum 分数相加
avg 求平均
first 取第一个函数的分数
max 取最大的函数分数
Min 取最小的函数分数

boost_mode
score_mode 是决定各 function 函数得出的分数的处理方式，而 boost_mode 是 function 最终分数与 query 原始分的处理方式

multiply 相乘
replace 函数分替代原始查询分
sum 求和
avg 函数分和查询分的平均值
max 函数分与查询分取最大
min 函数分与查询分，取最小

function score query

全文检索 (full text queries)

1. Intervals query

间隔搜索

为了更加灵活的控制查询时字符串在文本中匹配的距离和先后顺序，官方在 7.x 加入的间隔搜索特性，用户可单一或者组合多个规则集合在一个特定的 text field 上进行操作

curl -X POST "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "intervals" : {
      "my_text" : {
        "all_of" : {
          "ordered" : true,
          "intervals" : [
            {
              "match" : {
                "query" : "my favorite food",
                "max_gaps" : 0,
                "ordered" : true
              }
            },
            {
              "any_of" : {
                "intervals" : [
                  { "match" : { "query" : "hot water" } },
                  { "match" : { "query" : "cold porridge" } }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

match rule

match
prefix
wildcard
fuzzy
all_of：一组所有的规则必须都匹配的规则
any_of: 一组只要其中一个规则匹配即可的规则

match

query
max_gaps
ordered

all_of

Intervals : 所有必须要匹配一组规则
max_gaps: 匹配词之间的最大位置间隔数

默认 -1 , 代表无间隔要求
0 代表必须相邻

ordered: intervals 之间是有序的

默认为 false

filter: 过滤规则

any_of

Intervals: 匹配一个规则
filter: 过滤规则

filter

此 filter 不同于之前在 match query 提到的 filter ，它有自己独特的语法

after: query intervals 在 filter intervals 之后
before：query intervals 在 filter intervals 之前
containing: query intervals 包含 filter intervals 则为 true

bool query (must + must)

not_containing: query intervals 不包含 filter intervals 则为 true

bool query (must + must not)

Example

歌曲名（查找 Teu xxx Chamar 或 Teu xxx Playback）

查询 Teu 在 Chamar 之前的歌曲
查找 Teu 在 Playback 之前的歌曲

{
  "query": {
    "intervals" : {
      "songName" : {
        "all_of" : {
          "ordered" : true, # intervals 之间有顺序
          "max_gaps": 0, # 且必须相邻
          "intervals" : [
            {
              "match" : {
                "query" : "Teu O",
                "max_gaps" : 0, # Teu 和 O 必须相邻
                "ordered" : false # Teu 和 O 不需要顺序
              }
            },
            {
              "any_of" : {
                "intervals" : [
                  { "match" : { "query" : "Chamar" } },
                  { "match" : { "query" : "Playback" } }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

歌名中 Playback 在 Teu 之后的

{
  "query": {
    "intervals" : {
      "songName" : {
        "all_of" : {
          "ordered" : true,
          "intervals" : [
            {
              "match" : {
                "query" : "Playback",
                "max_gaps" : 0,
                "ordered" : true
              }
            }
          ],
          "filter": {
        	"after": {
        		"match": {
        			"query": "Teu"
        		}
        	}
        }
        }
      }
    }
  }
}

过滤掉包含 Chamar 的（即符合不包含 Chamar 就符合条件）

{
  "query": {
    "intervals" : {
      "songName" : {
        "all_of" : {
          "ordered" : true,
          "intervals" : [
            {
              "match" : {
                "query" : "Teu",
                "max_gaps" : 0,
                "ordered" : true
              }
            },
            {
              "any_of" : {
                "intervals" : [
                	{ "match" : { "query" : "Chamar" } },
                	{ "match" : { "query" : "Playback" } }
                ]
              }
            }
          ],
          "filter": {
        	"not_containing": {
        		"match": {
        			"query": "Chamar"
        		}
        	}
        }
        }
      }
    }
  }
}

intervals query

2. match

query: 查询文本
analyzer: 分词器
fuzziness: 最大编辑距离（莱文斯坦距离），用于模糊匹配，默认为 0，即不可编辑

0,1,2 表示可允许的最大莱文斯坦距离
AUTO 自动根据词项的长度产生可编辑的距离

fuzzy_transpositions: 是否允许模糊换位，默认 true, 可搭配 fuzziness 使用
max_expansions: 最大短语扩展数，默认 50
prefix_length: 前多少个字符不开启模糊匹配
lenient: 宽容情况，当某个字段是数值类型，而参数传了字符串类型时，默认为 false ，会报错, 为 true 时，不会报错，会转换查询
operator：当 query = “周杰伦”

OR 或： 文档只要匹配 “周” 或 “杰” 或 “伦” 就可以匹配中
AND 且文档必须 “周” ， “杰” ， “伦” 都匹配中

minimum_should_match：文档需要匹配中多少百分比的分词才允许返回

比如周杰伦，我们要求匹配分词超过百分之 60 才返回，那么只含有周的文档是不会返回的，至少匹配中 “周,杰” 或 ”伦,杰“ 等
可以降低搜索结果，减少长尾，擦边的文档

{
	"query": {
		"match": {
			"songName": {
				"query" : "O Teu Chamer",
				"fuzziness": 0,
				"max_expansions": 1,
				"prefix_length": 3,
				"operator": "OR"
			}
		}
	}
}

match query

3. match_bool_prefix query

match_bool_prefix

match_bool_prefix 查询将输入文本通过指定的 analyzer 分词器处理为多个 term, 然后基于前些 term 进行 term query, 但最后一个 term 使用前缀查询
term 查询支持 match 的多种参数， prefix 不支持模糊匹配的参数

operator
fuzziness
Prefix_length
max_expansions
…

Example

{
	"query": {
		"match_bool_prefix": {
			"songName": "O Teu Ch"
		}
	}
}

{
	"query": {
		"match_bool_prefix": {
			"songName": "Teu O Ch"
		}
	}
}

上面两者都可以得到期望结果，分词为O, Teu, Ch, 无序 (operator = OR), 前两者使用 term query, 最后的Ch 则使用 prefix query, 类似于下面的 bool query

4. match_phrase query

短语查询

match_phrase 的基本单位是词组，词组无法模糊匹配
分词成的多个 term 必须精确，并且按顺序排序。在顺序正确的情况下，允许 term 之间有词组间隔，由 slop 控制，默认为 0，即两两词组必须相连（有点像之间说过的 interval query , 也可以实现类似功能）
说白了，跟 match 的区别时，term 是确切的，term 之间是有序的

Example

查询 O Chamar 词组

# 无结果
{
	"query": {
		"match_phrase":  {
			"songName": {
				"query": "O Chamar",
				"slop": 0
			}
		}
	}
}

# 有结果
{
	"query": {
		"match_phrase":  {
			"songName": {
				"query": "O Chamar",
				"slop": 1
			}
		}
	}
}

无法搭配 fuzziness , 即 match_phrase 没有模糊匹配模式

同样是要查询 O Teu Chamer, 看与 match 之间的区别

{
	"query": {
		"match":  {
			"songName": "O Chemar Teu" # 乱序，且 Chemar 拼写错误
		}
	}
}
# 可以查询到期望结果

{
	"query": {
		"match_phrase":  {
			"songName": "O Chemar Teu"
		}
	}
}
# 凉凉

{
	"query": {
		"match_phrase":  {
			"songName": "O Chamar Teu" # 乱序，但修正单词错误
		}
	}
}
# 凉凉

{
	"query": {
		"match_phrase":  {
			"songName": "O Teu Chamar"
		}
	}
}
# 有结果

5. match_phrase_prefix query

短语前缀查询

前几个 term 使用 match_phrase 查询，最后一个 term 则采用 prefix 查询
match_bool_prefix 和 match_phrase_prefix 区别也很简单，说白了就是 match 和 match_phrase 的区别
match_phrase_prefix 通常可以用于给用户输入提供建议提醒
常用参数

max_expansion : 最大扩展数
slop: 分词词组之间最多间隔多少个位置，默认是 0，即必须相邻

Example

查询O Teu Ch开头的歌曲

{
	"query": {
		"match_phrase_prefix": {
			"songName": {
				"query": "O Teu Ch",
				"slop": 0,
				"max_expansions": 50
			}
		}
	}
}

找到两首 O Teu Chamar ,O Teu Chamar - Playback

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 6,
        "successful": 6,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 351.71133,
        "hits": [
            {
                "_index": "song",
                "_type": "_doc",
                "_id": "25329053628305609",
                "_score": 351.71133,
                "_source": {
                    "id": 25329053628305609,
                    "externalSongId": "spotify-6xyCay9wgDMf5phcovxBKO",
                    "songName": "O Teu Chamar",
                    "artistNames": [
                        "Jotta A"
                    ],
                    "albumName": "Essência",
                    "albumId": 25329053630400886,
                    "genres": [],
                    "coverUrl": "https://i.scdn.co/image/ab67616d00001e0294d215ec962e4c98fceb5e26",
                    "duration": 240,
                    "source": "YOUTUBE",
                    "externalId": "youtube-LP6TebWWPwE",
                    "playUrl": "https://www.youtube.com/watch?v=LP6TebWWPwE",
                    "lyricIds": [],
                    "areas": [
                        "BR_LP"
                    ],
                    "status": "ONLINE",
                    "dead": false
                }
            },
            {
                "_index": "song",
                "_type": "_doc",
                "_id": "25329053631010314",
                "_score": 297.21338,
                "_source": {
                    "id": 25329053631010314,
                    "externalSongId": "spotify-3DJn4rY1g4llBrm1HWXABA",
                    "songName": "O Teu Chamar - Playback",
                    "artistNames": [
                        "Jotta A"
                    ],
                    "albumName": "Essência (Playback)",
                    "albumId": 25329053632022016,
                    "genres": [],
                    "coverUrl": "https://i.scdn.co/image/ab67616d00001e02beb06d1a13ab2cf025c226b1",
                    "duration": 242,
                    "source": "YOUTUBE",
                    "externalId": "youtube-nXHgXVIF-bE",
                    "playUrl": "https://www.youtube.com/watch?v=nXHgXVIF-bE",
                    "lyricIds": [],
                    "areas": [
                        "BR_LP"
                    ],
                    "status": "ONLINE",
                    "dead": false
                }
            }
        ]
    }
}

6. multi_match query

多字段查询

类似于 match query 的多字段版本，但是更强大
不管是 match, match_bool_prefix, match_phrase, match_phrase_prefix 都是针对一个字段的全文检索，而 multi_match 针对多个字段进行同一个查询
虽然复合查询也可以实现同样的效果，但由单个查询实现会更简洁

type

best_fields 默认类型

match query 多字段版本
就像 dis_max 的最佳匹配

most_fields

match query 多字段版本
适用于同一个字段的多种分析器, 让多个 fields 参与到总分计算做，权重比较均匀

pharse

match_phrase 的多字段版本

phrase_prefix

match_phrase_prefix 的多字段版本

Example

*模糊匹配名称,^2 相当于给该字段修改权重，等价于 boost = 2.0

{
	"query": {
    	"multi_match": {
        	"query":  "Ao",
        	"fields": [ "*Name", "artistNames^2" ] 
     	}
    }
}

(Best_fields) 查询 songName, artistNames, albumName 之中含有 Teu 的歌曲

{
  "query": {
    "multi_match" : {
      "query":    "Ao", 
      "fields": [ "songName", "albumName","artistNames" ],
      "type": "best_fields",
      "tie_breaker":          0.3
    }
  }
}

它实际上就是 dis_max 符合查询的简洁版本

{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "songName": "Ao" }},
        { "match": { "albumName": "Ao" }},
        { "match": { "artistNames": "Ao" }}
      ],
      "tie_breaker": 0.3
    }
  }
}

(most_feilds) 针对 title 的多种分词进行搜索

{
  "query": {
    "multi_match" : {
      "query":      "quick brown fox",
      "type":       "most_fields",
      "fields":     [ "title", "title.original", "title.shingles" ]
    }
  }
}

参考

ElasticSearch 7.x
ES Hight Level Rest Client Builder
如果觉得对你有帮助，能否点个赞或关个注，以示鼓励笔者呢？！

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：resnet输入尺寸如何处理 resnet输入图片大小

下一篇：android cpu调度的优先级安卓cpu调度器哪个好

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯