文章目录
- 1. 基础概念
- 1.1 分数(score)
- 1.2 查询(query context)
- 1.3 过滤(filter context)
- 2. `基本查询`
- 2.1 查询所有(`match_all`)
- 2.2 匹配查询(`match`)
- 2.3 多字段查询(`multi_match`)
- 2.4 词条匹配(`term`)
- 2.5 多词条精确匹配(`terms`)
- 3. `过滤`
- 3.1 `_source过滤`
- 3.1.1 直接指定字段
- 3.1.2 指定`includes`和`excludes`
- 3.1.3 `filter过滤`
- 4. ` 高级查询 `
- 4.1 布尔组合(`bool`)
- 4.2 范围查询(`range`)
- 4.3 模糊查询(`fuzzy`)
- 4.4 `Boosting Query`
- 5. ` 排序 `
- 5.1 `单字段排序`
- 5.2 `多字段排序`
- 6. ` 高亮 `
- 7. `分页`
1. 基础概念
1.1 分数(score)
ES的搜索结果是按照相关分数的高低进行排序的,因为在搜索的过程中,会计算这个分数。这个分数代表了这条记录匹配搜索内容的相关程度。分数是一个浮点型的数字,对应的是搜索结果中的_score字段,分数越高代表匹配度越高,排序越靠前。
在ES的搜索当中,分为两种,一种计算分数,而另外一种是不计算分数的。
1.2 查询(query context)
查询,代表的是这条记录与搜索内容匹配的怎么样,除了决定这条记录是否匹配外,还要计算这条记录的相关分数。这个和咱们平时的查询是一样的,比如我们搜索一个关键词,分词以后匹配到相关的记录,这些相关的记录都是查询的结果,那这些结果谁排名靠前,谁排名靠后呢?这个就要看匹配的程度,也就是计算的分数。
1.3 过滤(filter context)
过滤,代表的含义非常的简单,就是YES or NO,这条记录是否匹配查询条件,它不会计算分数。频繁使用的过滤还会被ES加入到缓存,以提升ES的性能。
2. 基本查询
基本语法
GET /索引库名/_search
{
"query":{
"查询类型":{
"查询条件":"查询条件值"
}
}
}
这里的query代表一个查询对象,里面可以有不同的查询属性
- 查询类型:
- 例如:
match_all
,match
,term
,range
等等
- 查询条件:查询条件会根据类型的不同,写法也有差异,后面详细讲解
2.1 查询所有(match_all
)
示例:
GET wql/_search
{
"query": {
"match_all": {}
}
}
-
query
:代表查询对象 -
match_all
:代表查询所有
结果:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"userName" : "zhansan",
"userPhone" : "15727538286",
"userAdress" : "江西省宜春市上高县泗溪镇"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"userName" : "lisi",
"userPhone" : "17067888006",
"userAdress" : "江西省高安市上高县泗溪镇"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"userName" : "wangwu",
"userPhone" : "15797721570",
"userAdress" : "江西省南昌市上高县泗溪镇"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"userName" : "zhaoliu",
"userPhone" : "15727538286",
"userAdress" : "江西省九江市上高县泗溪镇"
}
}
]
}
}
- took:查询花费时间,单位是毫秒
- time_out:是否超时
- _shards:分片信息
- hits:搜索结果总览对象
- total:搜索到的总条数
- max_score:所有结果中文档得分的最高分
- hits:搜索结果的文档对象数组,每个元素是一条搜索到的文档信息
- _index:索引库
- _type:文档类型
- _id:文档id
- _score:文档得分
- _source:文档的源数据
2.2 匹配查询(match
)
match
类型查询,会把查询条件进行分词,然后进行查询,默认多个词条之间是or
的关系
GET wql/_search
{
"query": {
"match": {
"userAdress": "九江市上高县"
}
}
}
结果:
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 4.387768,
"hits" : [
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "4",
"_score" : 4.387768,
"_source" : {
"userName" : "zhaoliu",
"userPhone" : "15727538286",
"userAdress" : "江西省九江市上高县泗溪镇"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "5",
"_score" : 3.6486926,
"_source" : {
"userName" : "zhaoliu",
"userPhone" : "15727538286",
"userAdress" : "江西省九江市黄塘村"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.1584977,
"_source" : {
"userName" : "zhansan",
"userPhone" : "15727538286",
"userAdress" : "江西省宜春市上高县泗溪镇"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1584977,
"_source" : {
"userName" : "lisi",
"userPhone" : "17067888006",
"userAdress" : "江西省高安市上高县泗溪镇"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.1584977,
"_source" : {
"userName" : "wangwu",
"userPhone" : "15797721570",
"userAdress" : "江西省南昌市上高县泗溪镇"
}
}
]
}
}
结果发现,多个词之间是or
的关系。
and关系
某些情况下,我们需要更精确查找,我们希望这个关系变成and
,可以这样做:
GET wql/_search
{
"query": {
"match": {
"userAdress": {
"query": "九江市上高县",
"operator": "and"
}
}
}
}
结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 4.387768,
"hits" : [
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "4",
"_score" : 4.387768,
"_source" : {
"userName" : "zhaoliu",
"userPhone" : "15727538286",
"userAdress" : "江西省九江市上高县泗溪镇"
}
}
]
}
}
2.3 多字段查询(multi_match
)
multi_match
与match
类似,不同的是它可以在多个字段中查询
这里我特意新增一条数据做测试
POST /wql/_doc/6
{
"userName": "九江市",
"userPhone": "15727538286",
"userAdress": "黄塘村"
}
GET wql/_search
{
"query": {
"multi_match": {
"query": "九江市",
"fields": [
"userAdress",
"userName"
]
}
}
}
本案例当中,我们会在userAdress和userName查找
结果:
{
"took" : 266,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 5.587492,
"hits" : [
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "5",
"_score" : 5.587492,
"_source" : {
"userName" : "zhaoliu",
"userPhone" : "15727538286",
"userAdress" : "江西省九江市黄塘村"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "6",
"_score" : 4.236225,
"_source" : {
"userName" : "九江市",
"userPhone" : "15727538286",
"userAdress" : "黄塘村"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "4",
"_score" : 3.651648,
"_source" : {
"userName" : "zhaoliu",
"userPhone" : "15727538286",
"userAdress" : "江西省九江市上高县泗溪镇"
}
}
]
}
}
2.4 词条匹配(term
)
term
查询被用于精确值 匹配,这些精确值可能是数字、时间、布尔或者那些未分词的字符串
GET wql/_search
{
"query": {
"term": {
"userPhone": {
"value": "17067888006"
}
}
}
}
结果:
{
"took" : 27,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.540445,
"hits" : [
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.540445,
"_source" : {
"userName" : "lisi",
"userPhone" : "17067888006",
"userAdress" : "江西省高安市上高县泗溪镇"
}
}
]
}
}
2.5 多词条精确匹配(terms
)
terms
查询和 term 查询一样,但它允许你指定多值进行匹配。如果这个字段包含了指定值中的任何一个值,那么这个文档满足条件:
GET wql/_search
{
"query": {
"terms": {
"userPhone": [
"17067888006",
"17067888007"
]
}
}
}
结果:
{
"took" : 147,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"userName" : "lisi",
"userPhone" : "17067888006",
"userAdress" : "江西省高安市上高县泗溪镇"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"userName" : "九江市",
"userPhone" : "17067888007",
"userAdress" : "黄塘村"
}
}
]
}
}
3. 过滤
3.1 _source过滤
默认情况下,elasticsearch在搜索的结果中,会把文档中保存在_source
的所有字段都返回。
如果我们只想获取其中的部分字段,我们可以添加_source
的过滤
3.1.1 直接指定字段
示例:
GET wql/_search
{
"_source": ["userPhone"],
"query": {
"terms": {
"userPhone": [
"17067888006",
"17067888007"
]
}
}
}
返回的结果:
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"userPhone" : "17067888006"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"userPhone" : "17067888007"
}
}
]
}
}
3.1.2 指定includes
和excludes
我们也可以通过:
- includes:来指定想要显示的字段
- excludes:来指定不想要显示的字段
二者都是可选的。
注意: 都有时,excludes优先级>includes优先级
示例:
GET wql/_search
{
"_source": {
"includes": [
"userPhone",
"userName"
],
"excludes": [
"userAdress",
"userName"
]
},
"query": {
"terms": {
"userPhone": [
"17067888006",
"17067888007"
]
}
}
}
结果如下:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"userPhone" : "17067888006"
}
},
{
"_index" : "wql",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"userPhone" : "17067888007"
}
}
]
}
}
3.1.3 filter过滤
条件查询中进行过滤
所有的查询都会影响到文档的评分及排名。如果我们需要在查询结果中进行过滤,并且不希望过滤条件影响评分,那么就不要把过滤条件作为查询条件来用。而是使用filter
方式:
GET wql/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"userAdress": "江西"
}
}
],
"filter": [
{
"range": {
"userPhone": {
"gte": 17067888006
}
}
}
]
}
}
}
无查询条件,直接过滤
如果一次查询只有过滤,没有查询条件,不希望进行评分,我们可以使用constant_score
取代只有 filter 语句的 bool 查询。在性能上是完全相同的,但对于提高查询简洁性和清晰度有很大帮助。
GET /wql/_search
{
"query": {
"constant_score": {
"filter": {
"range": {
"userPhone": {
"gte": 17067888006
}
}
}
}
}
}
4. 高级查询
4.1 布尔组合(bool
)
关键词 | 描述 |
must | 必须满足的条件,而且会计算分数 |
filter | 必须满足的条件,不会计算分数 |
should | 可以满足的条件,会计算分数 |
must_not | 必须不满足的条件,不会计算分数 |
bool
把各种其它查询通过must
(与)、must_not
(非)、should
(或)的方式进行组合
GET /wang/_search
{
"query":{
"bool":{
"must": { "match": { "title": "大米" }},
"must_not": { "match": { "title": "电视" }},
"should": { "match": { "title": "手机" }}
}
}
}
结果:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{
"_index": "wang",
"_type": "goods",
"_id": "2",
"_score": 0.5753642,
"_source": {
"title": "大米手机",
"images": "http://image.leyou.com/12479122.jpg",
"price": 2899
}
}
]
}
}
4.2 范围查询(range
)
range
查询找出那些落在指定区间内的数字或者时间
GET /wang/_search
{
"query":{
"range": {
"price": {
"gte": 1000.0,
"lt": 2800.00
}
}
}
}
range
查询允许以下字符:
操作符 | 说明 |
gt | 大于 |
gte | 大于等于 |
lt | 小于 |
lte | 小于等于 |
4.3 模糊查询(fuzzy
)
fuzzy
查询是 term
查询的模糊等价。它允许用户搜索词条与实际词条的拼写出现偏差,但是偏差的编辑距离不得超过2:
GET /wang/_search
{
"query": {
"fuzzy": {
"title": "appla"
}
}
}
上面的查询,也能查询到apple手机
我们可以通过fuzziness
来指定允许的编辑距离:
GET /wang/_search
{
"query": {
"fuzzy": {
"title": {
"value":"appla",
"fuzziness":1
}
}
}
}
4.4 Boosting Query
这个查询比较有意思,它有两个关键词positive
和negative
:
-
positive
是“正”,所有满足positive条件的数据都会被查询出来; -
negative
是“负”,满足negative条件的数据并不会被过滤掉,而是会扣减分数。 -
negative_boost
是得分的系数,它的分数在0~1之间,满足了negative条件的数据,它们的分数会乘以这个系数,比如这个系数是0.5,原来100分的数据如果满足了negative条件,它的分数会乘以0.5,变成50分。
5. 排序
5.1 单字段排序
sort
可以让我们按照不同的字段进行排序,并且通过order
指定排序的方式
GET /wang/_search
{
"query": {
"match": {
"title": "小米手机"
}
},
"sort": [
{
"price": {
"order": "desc"
}
}
]
}
5.2 多字段排序
假定我们想要结合使用 price和 _score(得分) 进行查询,并且匹配的结果首先按照价格排序,然后按照相关性得分排序:
GET /goods/_search
{
"query":{
"bool":{
"must":{ "match": { "title": "小米手机" }},
"filter":{
"range":{"price":{"gt":200000,"lt":300000}}
}
}
},
"sort": [
{ "price": { "order": "desc" }},
{ "_score": { "order": "desc" }}
]
}
6. 高亮
elasticsearch中实现高亮的语法比较简单:
GET /wang/_search
{
"query": {
"match": {
"title": "手机"
}
},
"highlight": {
"pre_tags": "<em>",
"post_tags": "</em>",
"fields": {
"title": {}
}
}
}
在使用match查询的同时,加上一个highlight属性:
- pre_tags:前置标签
- post_tags:后置标签
- fields:需要高亮的字段
- title:这里声明title字段需要高亮,后面可以为这个字段设置特有配置,也可以空
结果:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.2876821,
"hits": [
{
"_index": "wang",
"_type": "goods",
"_id": "2",
"_score": 0.2876821,
"_source": {
"title": "大米手机",
"images": "http://image.leyou.com/12479122.jpg",
"price": 2899
},
"highlight": {
"title": [
"大米<em>手机</em>"
]
}
},
{
"_index": "wang",
"_type": "goods",
"_id": "JP6xa2kBtq36Pzvxpjaf",
"_score": 0.19856805,
"_source": {
"title": "小米手机",
"images": "http://image.leyou.com/12479122.jpg",
"price": 2699
},
"highlight": {
"title": [
"小米<em>手机</em>"
]
}
},
{
"_index": "wang",
"_type": "goods",
"_id": "3",
"_score": 0.16853254,
"_source": {
"title": "超大米手机",
"images": "http://image.leyou.com/12479122.jpg",
"price": 3299,
"stock": 200,
"saleable": true,
"subTitle": "哈哈"
},
"highlight": {
"title": [
"超大米<em>手机</em>"
]
}
}
]
}
}
7. 分页
通过from和size来指定分页的开始位置及每页大小。
语法:
GET /wang/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"price": {
"order": "asc"
}
}
],
"from": 10000,
"size": 2
}
但是,其本质是逻辑分页,因此为了避免深度分页的问题,ES限制最多查到第10000条。
如果需要查询到10000以后的数据,你可以采用两种方式:
- scroll滚动查询
- search after