一. IK分词器

1. 什么是IK分词器

分词: 即把一段中文或者别的划分成一个个的关键字, 我们在搜索时候会把自己的信息进行分词, 会把数据库中或者索引库中的数据进行分词, 然后进行一个匹配操作, 默认的中文分词是将每个字看成一个词, 比如 “我爱狂神” 会被分为 “我”,“爱”,“狂”,“神” , 这显然是不符合要求的, 所以我们需要安装中文分词器 ik 来解决这个问题。

2. 分词算法

IK 提供了两个分词算法: ik_ smart 和 ik_ max_ word ,
其中 ik_ smart 为最少切分, ik_ max_ _word 为最细粒度划分!

1. 最少切分: ik_smart
  1. 命令
GET _analyze
{
  "analyzer": "ik_smart",
  "text": "我是社会主义接班人"
}
  1. 结果
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "社会主义",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "接班人",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}
2. 最细粒度划分: ik_max_word
  1. 命令
GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "我是社会主义接班人"
}
  1. 结果
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "社会主义",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "社会",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "主义",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "接班人",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "接班",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "人",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_CHAR",
      "position" : 7
    }
  ]
}

二. 命令模式的使用

1. Rest风格说明

Method

URL地址

描述

PUT

localhost:9200/索引名称/类型名称/文档id

创建文档(指定文档id)

POST

localhost:9200/索引名称/类型名称

创建文档(随机文档id)

POST

localhost:9200/索引名称/类型名称/文档id/_update

修改文档

DELETE

localhsot:9200/索引名称/类型名称/文档id

删除文档

GET

localhost:9200/索引名称/类型名称/文档id

通过文档id查询文档

POST

localhost:9200/索引名称/类型名称/_search

查询所有的数据

2. 基础测试
  1. 创建索引
PUT es1
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "birthday": {
        "type": "date"
      }
    }
  }
}

# 结果
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "es1"
}
  1. 删除索引
# 命令
DELETE es1

# 结果
{
  "acknowledged" : true
}
  1. 创建一个文档

localhost:9200/索引名称/类型名称/文档id

# 命令
PUT /es/test/1
{
  "name": "张三",
  "age": 22
}

# 结果
{
  "_index" : "es",
  "_type" : "test",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",			// 创建
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
  1. 查看文档
# 指令
GET es
GET es/test/1

# 结果
{
  "_index" : "es",
  "_type" : "test",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "张三",
    "age" : 22
  }
}
  1. 修改文档
# 指令1: PUT: 必须包含为修改的数据, 否则会丢失数据
PUT es/test/1
{
  "name": "李四",
  "age": 17
}

GET es/test/1

# 结果1
{
  "_index" : "es",
  "_type" : "test",
  "_id" : "1",
  "_version" : 2,			// 版本号会递增
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "李四",
    "age" : 17
  }
}

# 指令2: POST: 只需要包含修改部分的数据
POST es/test/1/_update
{
  "doc": {
    "name": "王五"
  }
}
GET es/test/1

# 结果2
{
  "_index" : "es",
  "_type" : "test",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "王五",
    "age" : 17
  }
}

三. 查询

1. 简单查询
  1. 查询指定字段
# 查询nme中包含四 或 五的数据
GET /es/test/_search
{
  "query": {
    "match": {
      "name": "四 五"
    }
  }
}
  1. 只显示指定属性
# 方法1: 只显示name属性的值
GET /es/test/_search
{
  "query": {
    "match": {
      "name": "四 五"
    }
  },
  // 包含指定的属性
   "_source": ["name"]
}

# 方法2: 
GET /es/test/_search
{
  "query": {
    "match": {
      "name": "四 五"
    }
  },
  
  "_source": {
    // includes: 包含指定的属性
    "includes": ["name"]
  }
}
  1. 过滤指定字段的值
# 命令
GET /es/test/_search
{
  "query": {
    "match": {
      "name": "四 五"
    }
  },
  
  "_source": {
    // excludes: 不包含指定的属性
    "excludes": "age"
  }
}
  1. 排序
# 按照age倒序排序
GET /es/test/_search
{
  "sort": {
    "age": {
      "order": "desc"
    }
  }
}
  1. 分页
GET /es/test/_search
{
  "sort": {
    "age": {
      "order": "desc"
    }
  },
  
  // 从下标2开始读取3条数据
  "from": 2,
  "size": 3
}
2. 多条件查询
  1. must(and): 所有条件都要满足
# 查询出name中包含"王" 且 age = 23的数据
GET /es/test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "王"
          }
        },
        {
          "match": {
            "age": "23"
          }
        }
      ]
    }
  }
}
  1. should(or): 满足其中一个条件即可
# 查询出name中包含"王" 或 age = 25的数据
GET /es/test/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": "王"
          }
        },
        {
          "match": {
            "age": "25"
          }
        }
      ]
    }
  }
}
  1. must_not(not): 过滤掉指定数据
# must_not: 过滤掉指定数据
GET /es/test/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "name": "王"
          }
        },
        {
          "match": {
            "age": "25"
          }
        }
      ]
    }
  }
}
  1. 条件区间

gt: 大于; gte: 大于等于; lt: 小于; lte: 小于等于

# 条件区间
GET /es/test/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gt": 22,
            "lt": 25
          }
        }
      }
    }
  }
}
  1. 匹配多个条件
# 匹配多个条件, 满足其中一个条件即可
GET /es/test/_search
{
  "query": {
    "match": { 
      "name": "张 王"
    }
  }
}

四. 分词

1. 说明
  1. term: 精确查询
  2. match: 会使用分词器解析
2. 例子
  1. 创建索引
# text: 会做分词查询
# keyword: 不会分词搜索
PUT /t1
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "content": {
        "type": "keyword"
      }
    }
  }
}
  1. term: 只会获取精确匹配的数据
GET /t1/_doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "name": "用户"
          }
        },
        {
          "term": {
            "content": "呼吸"
          }
        }
      ]
    }
  }
}

# 结果
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6931471,
    "hits" : [
      {
        "_index" : "t1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6931471,
        "_source" : {
          "name" : "用户1",
          "content" : "呼吸"
        }
      }
    ]
  }
}
  1. match: 会使用分词器解析
GET /t1/_doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {		
            // 虽然是match, 但是content是keyword类型, 所以仍然不会分词查询
            "content": "呼吸"
          }
        }
      ]
    }
  }
}

#结果
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6931471,
    "hits" : [
      {
        "_index" : "t1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6931471,
        "_source" : {
          "name" : "用户1",
          "content" : "呼吸"
        }
      }
    ]
  }
}

五. 高亮

  1. 例子
GET /t1/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": "3"
          }
        }
      ]
    }
  },
  "highlight": {
    // 自定义高亮样式
    "pre_tags": "<strong style='color:red'>", 
    "post_tags": "</strong>", 
    
    "fields": {
      "name": {}
    }
  }
}

# 结果
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.2039728,
    "hits" : [
      {
        "_index" : "t1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.2039728,
        "_source" : {
          "name" : "用户3",
          "content" : "呼吸呼吸"
        },
        "highlight" : {
          "name" : [
            "用户<strong style='color:red'>3</strong>"
          ]
        }
      }
    ]
  }
}