Windows 安装 IK 分词器 ---- ElasticSearch 7.X

  • 1. 得到需要移动的 zip 压缩包
  • 1.1 方法一: 通过打包得到
  • 1.1.1 下载并解压包
  • 1.1.2 打包
  • 1.2 方法二: 直接下载
  • 3. 移动文件
  • 4. 重新启动 ElasticSearch
  • 5. 效果
  • 5.1 正常分词
  • 5.2 ik_smart 分词
  • 5.3 ik_max_word 分词
  • 6. 自定义词库


1. 得到需要移动的 zip 压缩包

1.1 方法一: 通过打包得到

1.1.1 下载并解压包

下载连接:
https://github.com/medcl/elasticsearch-analysis-ik

ES安装ik中文分词器_压缩包

解压路径: D:\install\ElasticSearch

1.1.2 打包

Window + R 后输入 cmd.
进入解压后的路径后进行打包.

D:

cd D:\install\ElasticSearch\elasticsearch-analysis-ik-7.6.2

mvn package

ES安装ik中文分词器_压缩包_02

上面的发现一个问题: 版本是 7.6.2 压缩后变成了 7.4.0 的, 所以用不了闪退. 看 pom.xml 文件, 再回头看下载的网页发现, 坑啊!

ES安装ik中文分词器_压缩包_03

1.2 方法二: 直接下载

直接下载比较慢. 但不用打包.

下载连接:
https://github.com/medcl/elasticsearch-analysis-ik/releases

ES安装ik中文分词器_elasticsearch_04

3. 移动文件

方法一: 进入 target/releaseszip 包拷贝到下面新建立的 analysis-ik 文件夹下, 之后将 zip 压缩包解压.

方法二: 直接将下载到的 zip 包拷贝到下面新建立的 analysis-ik 文件夹下, 之后将 zip 压缩包解压.

在你所安装 ES 的所在目录下的的 plugins 下创建 analysis-ik 文件夹.

ES安装ik中文分词器_压缩包_05

4. 重新启动 ElasticSearch

ES安装ik中文分词器_自定义_06

ES安装ik中文分词器_ES安装ik中文分词器_07

5. 效果

IK 分词器支持 ik_smartik_max_word 两种分词.
ik_smart: 大体分词.
ik_max_word: 细分词.

方法: GET 连接: http://localhost:9200/_analyze

http://localhost:9200/_analyze

5.1 正常分词

{
	"text":"人生就是一场戏, 因为有缘才相聚."
}

结果:

{
    "tokens": [
        {
            "token": "人",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<IDEOGRAPHIC>",
            "position": 0
        },
        {
            "token": "生",
            "start_offset": 1,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position": 1
        },
        {
            "token": "就",
            "start_offset": 2,
            "end_offset": 3,
            "type": "<IDEOGRAPHIC>",
            "position": 2
        },
        {
            "token": "是",
            "start_offset": 3,
            "end_offset": 4,
            "type": "<IDEOGRAPHIC>",
            "position": 3
        },
        {
            "token": "一",
            "start_offset": 4,
            "end_offset": 5,
            "type": "<IDEOGRAPHIC>",
            "position": 4
        },
        {
            "token": "场",
            "start_offset": 5,
            "end_offset": 6,
            "type": "<IDEOGRAPHIC>",
            "position": 5
        },
        {
            "token": "戏",
            "start_offset": 6,
            "end_offset": 7,
            "type": "<IDEOGRAPHIC>",
            "position": 6
        },
        {
            "token": "因",
            "start_offset": 9,
            "end_offset": 10,
            "type": "<IDEOGRAPHIC>",
            "position": 7
        },
        {
            "token": "为",
            "start_offset": 10,
            "end_offset": 11,
            "type": "<IDEOGRAPHIC>",
            "position": 8
        },
        {
            "token": "有",
            "start_offset": 11,
            "end_offset": 12,
            "type": "<IDEOGRAPHIC>",
            "position": 9
        },
        {
            "token": "缘",
            "start_offset": 12,
            "end_offset": 13,
            "type": "<IDEOGRAPHIC>",
            "position": 10
        },
        {
            "token": "才",
            "start_offset": 13,
            "end_offset": 14,
            "type": "<IDEOGRAPHIC>",
            "position": 11
        },
        {
            "token": "相",
            "start_offset": 14,
            "end_offset": 15,
            "type": "<IDEOGRAPHIC>",
            "position": 12
        },
        {
            "token": "聚",
            "start_offset": 15,
            "end_offset": 16,
            "type": "<IDEOGRAPHIC>",
            "position": 13
        }
    ]
}

5.2 ik_smart 分词

{
	"analyzer":"ik_smart",
	"text":"人生就是一场戏, 因为有缘才相聚."
}

结果:

{
    "tokens": [
        {
            "token": "人生",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "就是",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "一场",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "戏",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 3
        },
        {
            "token": "因为",
            "start_offset": 9,
            "end_offset": 11,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "有缘",
            "start_offset": 11,
            "end_offset": 13,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "才",
            "start_offset": 13,
            "end_offset": 14,
            "type": "CN_CHAR",
            "position": 6
        },
        {
            "token": "相聚",
            "start_offset": 14,
            "end_offset": 16,
            "type": "CN_WORD",
            "position": 7
        }
    ]
}

5.3 ik_max_word 分词

{
	"analyzer":"ik_max_word",
	"text":"人生就是一场戏, 因为有缘才相聚."
}
{
    "tokens": [
        {
            "token": "人生",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "生就",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "就是",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "一场",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "一",
            "start_offset": 4,
            "end_offset": 5,
            "type": "TYPE_CNUM",
            "position": 4
        },
        {
            "token": "场",
            "start_offset": 5,
            "end_offset": 6,
            "type": "COUNT",
            "position": 5
        },
        {
            "token": "戏",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 6
        },
        {
            "token": "因为",
            "start_offset": 9,
            "end_offset": 11,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "有缘",
            "start_offset": 11,
            "end_offset": 13,
            "type": "CN_WORD",
            "position": 8
        },
        {
            "token": "才",
            "start_offset": 13,
            "end_offset": 14,
            "type": "CN_CHAR",
            "position": 9
        },
        {
            "token": "相聚",
            "start_offset": 14,
            "end_offset": 16,
            "type": "CN_WORD",
            "position": 10
        }
    ]
}

6. 自定义词库

假设 景移乡 是一个网络词汇. 不想将其分词.

ES安装ik中文分词器_ES安装ik中文分词器_08

找到 analysis-ik 中的 config 文件夹.

ES安装ik中文分词器_自定义_09

添加一个自定义的文件: ykenan.dic, 内容就是 景移乡.
修改 IKAnalyzer.cfg.xml

ES安装ik中文分词器_elasticsearch_10

重启服务再次访问.

ES安装ik中文分词器_自定义_11