一. IK分词器
1. 什么是IK分词器
分词: 即把一段中文或者别的划分成一个个的关键字, 我们在搜索时候会把自己的信息进行分词, 会把数据库中或者索引库中的数据进行分词, 然后进行一个匹配操作, 默认的中文分词是将每个字看成一个词, 比如 “我爱狂神” 会被分为 “我”,“爱”,“狂”,“神” , 这显然是不符合要求的, 所以我们需要安装中文分词器 ik 来解决这个问题。
2. 分词算法
IK 提供了两个分词算法: ik_ smart 和 ik_ max_ word ,
其中 ik_ smart 为最少切分, ik_ max_ _word 为最细粒度划分!
1. 最少切分: ik_smart
- 命令
GET _analyze
{
"analyzer": "ik_smart",
"text": "我是社会主义接班人"
}
- 结果
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "社会主义",
"start_offset" : 2,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "接班人",
"start_offset" : 6,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 3
}
]
}
2. 最细粒度划分: ik_max_word
- 命令
GET _analyze
{
"analyzer": "ik_max_word",
"text": "我是社会主义接班人"
}
- 结果
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "社会主义",
"start_offset" : 2,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "社会",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "主义",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "接班人",
"start_offset" : 6,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "接班",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "人",
"start_offset" : 8,
"end_offset" : 9,
"type" : "CN_CHAR",
"position" : 7
}
]
}
二. 命令模式的使用
1. Rest风格说明
Method | URL地址 | 描述 |
PUT | localhost:9200/索引名称/类型名称/文档id | 创建文档(指定文档id) |
POST | localhost:9200/索引名称/类型名称 | 创建文档(随机文档id) |
POST | localhost:9200/索引名称/类型名称/文档id/_update | 修改文档 |
DELETE | localhsot:9200/索引名称/类型名称/文档id | 删除文档 |
GET | localhost:9200/索引名称/类型名称/文档id | 通过文档id查询文档 |
POST | localhost:9200/索引名称/类型名称/_search | 查询所有的数据 |
2. 基础测试
- 创建索引
PUT es1
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
"birthday": {
"type": "date"
}
}
}
}
# 结果
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "es1"
}
- 删除索引
# 命令
DELETE es1
# 结果
{
"acknowledged" : true
}
- 创建一个文档
localhost:9200/索引名称/类型名称/文档id
# 命令
PUT /es/test/1
{
"name": "张三",
"age": 22
}
# 结果
{
"_index" : "es",
"_type" : "test",
"_id" : "1",
"_version" : 1,
"result" : "created", // 创建
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
- 查看文档
# 指令
GET es
GET es/test/1
# 结果
{
"_index" : "es",
"_type" : "test",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "张三",
"age" : 22
}
}
- 修改文档
# 指令1: PUT: 必须包含为修改的数据, 否则会丢失数据
PUT es/test/1
{
"name": "李四",
"age": 17
}
GET es/test/1
# 结果1
{
"_index" : "es",
"_type" : "test",
"_id" : "1",
"_version" : 2, // 版本号会递增
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "李四",
"age" : 17
}
}
# 指令2: POST: 只需要包含修改部分的数据
POST es/test/1/_update
{
"doc": {
"name": "王五"
}
}
GET es/test/1
# 结果2
{
"_index" : "es",
"_type" : "test",
"_id" : "1",
"_version" : 3,
"_seq_no" : 2,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "王五",
"age" : 17
}
}
三. 查询
1. 简单查询
- 查询指定字段
# 查询nme中包含四 或 五的数据
GET /es/test/_search
{
"query": {
"match": {
"name": "四 五"
}
}
}
- 只显示指定属性
# 方法1: 只显示name属性的值
GET /es/test/_search
{
"query": {
"match": {
"name": "四 五"
}
},
// 包含指定的属性
"_source": ["name"]
}
# 方法2:
GET /es/test/_search
{
"query": {
"match": {
"name": "四 五"
}
},
"_source": {
// includes: 包含指定的属性
"includes": ["name"]
}
}
- 过滤指定字段的值
# 命令
GET /es/test/_search
{
"query": {
"match": {
"name": "四 五"
}
},
"_source": {
// excludes: 不包含指定的属性
"excludes": "age"
}
}
- 排序
# 按照age倒序排序
GET /es/test/_search
{
"sort": {
"age": {
"order": "desc"
}
}
}
- 分页
GET /es/test/_search
{
"sort": {
"age": {
"order": "desc"
}
},
// 从下标2开始读取3条数据
"from": 2,
"size": 3
}
2. 多条件查询
- must(and): 所有条件都要满足
# 查询出name中包含"王" 且 age = 23的数据
GET /es/test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "王"
}
},
{
"match": {
"age": "23"
}
}
]
}
}
}
- should(or): 满足其中一个条件即可
# 查询出name中包含"王" 或 age = 25的数据
GET /es/test/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "王"
}
},
{
"match": {
"age": "25"
}
}
]
}
}
}
- must_not(not): 过滤掉指定数据
# must_not: 过滤掉指定数据
GET /es/test/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"name": "王"
}
},
{
"match": {
"age": "25"
}
}
]
}
}
}
- 条件区间
gt: 大于; gte: 大于等于; lt: 小于; lte: 小于等于
# 条件区间
GET /es/test/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"gt": 22,
"lt": 25
}
}
}
}
}
}
- 匹配多个条件
# 匹配多个条件, 满足其中一个条件即可
GET /es/test/_search
{
"query": {
"match": {
"name": "张 王"
}
}
}
四. 分词
1. 说明
- term: 精确查询
- match: 会使用分词器解析
2. 例子
- 创建索引
# text: 会做分词查询
# keyword: 不会分词搜索
PUT /t1
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"content": {
"type": "keyword"
}
}
}
}
- term: 只会获取精确匹配的数据
GET /t1/_doc/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"name": "用户"
}
},
{
"term": {
"content": "呼吸"
}
}
]
}
}
}
# 结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "t1",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931471,
"_source" : {
"name" : "用户1",
"content" : "呼吸"
}
}
]
}
}
- match: 会使用分词器解析
GET /t1/_doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
// 虽然是match, 但是content是keyword类型, 所以仍然不会分词查询
"content": "呼吸"
}
}
]
}
}
}
#结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "t1",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931471,
"_source" : {
"name" : "用户1",
"content" : "呼吸"
}
}
]
}
}
五. 高亮
- 例子
GET /t1/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "3"
}
}
]
}
},
"highlight": {
// 自定义高亮样式
"pre_tags": "<strong style='color:red'>",
"post_tags": "</strong>",
"fields": {
"name": {}
}
}
}
# 结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2039728,
"hits" : [
{
"_index" : "t1",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.2039728,
"_source" : {
"name" : "用户3",
"content" : "呼吸呼吸"
},
"highlight" : {
"name" : [
"用户<strong style='color:red'>3</strong>"
]
}
}
]
}
}