ES安装ik中文分词器

转载

kekenai 2024-11-14 13:29:41

文章标签 ES安装ik中文分词器 elasticsearch 压缩包自定义 文章分类 架构后端开发

Windows 安装 IK 分词器 ---- ElasticSearch 7.X

1. 得到需要移动的 zip 压缩包

1.1 方法一: 通过打包得到

1.1.1 下载并解压包
1.1.2 打包

1.2 方法二: 直接下载

3. 移动文件
4. 重新启动 ElasticSearch
5. 效果

5.1 正常分词
5.2 ik_smart 分词
5.3 ik_max_word 分词

6. 自定义词库

1. 得到需要移动的 zip 压缩包

1.1 方法一: 通过打包得到

1.1.1 下载并解压包

下载连接:
https://github.com/medcl/elasticsearch-analysis-ik

ES安装ik中文分词器_压缩包

解压路径: D:\install\ElasticSearch

1.1.2 打包

Window + R 后输入 cmd.
进入解压后的路径后进行打包.

D:

cd D:\install\ElasticSearch\elasticsearch-analysis-ik-7.6.2

mvn package

ES安装ik中文分词器_压缩包_02

上面的发现一个问题: 版本是 7.6.2 压缩后变成了 7.4.0 的, 所以用不了闪退. 看 pom.xml 文件, 再回头看下载的网页发现, 坑啊!

ES安装ik中文分词器_压缩包_03

1.2 方法二: 直接下载

直接下载比较慢. 但不用打包.
下载连接:
https://github.com/medcl/elasticsearch-analysis-ik/releases

ES安装ik中文分词器_elasticsearch_04

3. 移动文件

方法一: 进入 target/releases 将 zip 包拷贝到下面新建立的 analysis-ik 文件夹下, 之后将 zip 压缩包解压.
方法二: 直接将下载到的 zip 包拷贝到下面新建立的 analysis-ik 文件夹下, 之后将 zip 压缩包解压.
在你所安装 ES 的所在目录下的的 plugins 下创建 analysis-ik 文件夹.

ES安装ik中文分词器_压缩包_05

4. 重新启动 ElasticSearch

ES安装ik中文分词器_自定义_06

ES安装ik中文分词器_ES安装ik中文分词器_07

5. 效果

IK 分词器支持 ik_smart 与 ik_max_word 两种分词.
ik_smart: 大体分词.
ik_max_word: 细分词.
方法: GET 连接: http://localhost:9200/_analyze

http://localhost:9200/_analyze

5.1 正常分词

{
	"text":"人生就是一场戏, 因为有缘才相聚."
}

结果:

{
    "tokens": [
        {
            "token": "人",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<IDEOGRAPHIC>",
            "position": 0
        },
        {
            "token": "生",
            "start_offset": 1,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position": 1
        },
        {
            "token": "就",
            "start_offset": 2,
            "end_offset": 3,
            "type": "<IDEOGRAPHIC>",
            "position": 2
        },
        {
            "token": "是",
            "start_offset": 3,
            "end_offset": 4,
            "type": "<IDEOGRAPHIC>",
            "position": 3
        },
        {
            "token": "一",
            "start_offset": 4,
            "end_offset": 5,
            "type": "<IDEOGRAPHIC>",
            "position": 4
        },
        {
            "token": "场",
            "start_offset": 5,
            "end_offset": 6,
            "type": "<IDEOGRAPHIC>",
            "position": 5
        },
        {
            "token": "戏",
            "start_offset": 6,
            "end_offset": 7,
            "type": "<IDEOGRAPHIC>",
            "position": 6
        },
        {
            "token": "因",
            "start_offset": 9,
            "end_offset": 10,
            "type": "<IDEOGRAPHIC>",
            "position": 7
        },
        {
            "token": "为",
            "start_offset": 10,
            "end_offset": 11,
            "type": "<IDEOGRAPHIC>",
            "position": 8
        },
        {
            "token": "有",
            "start_offset": 11,
            "end_offset": 12,
            "type": "<IDEOGRAPHIC>",
            "position": 9
        },
        {
            "token": "缘",
            "start_offset": 12,
            "end_offset": 13,
            "type": "<IDEOGRAPHIC>",
            "position": 10
        },
        {
            "token": "才",
            "start_offset": 13,
            "end_offset": 14,
            "type": "<IDEOGRAPHIC>",
            "position": 11
        },
        {
            "token": "相",
            "start_offset": 14,
            "end_offset": 15,
            "type": "<IDEOGRAPHIC>",
            "position": 12
        },
        {
            "token": "聚",
            "start_offset": 15,
            "end_offset": 16,
            "type": "<IDEOGRAPHIC>",
            "position": 13
        }
    ]
}

5.2 ik_smart 分词

{
	"analyzer":"ik_smart",
	"text":"人生就是一场戏, 因为有缘才相聚."
}

结果:

{
    "tokens": [
        {
            "token": "人生",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "就是",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "一场",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "戏",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 3
        },
        {
            "token": "因为",
            "start_offset": 9,
            "end_offset": 11,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "有缘",
            "start_offset": 11,
            "end_offset": 13,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "才",
            "start_offset": 13,
            "end_offset": 14,
            "type": "CN_CHAR",
            "position": 6
        },
        {
            "token": "相聚",
            "start_offset": 14,
            "end_offset": 16,
            "type": "CN_WORD",
            "position": 7
        }
    ]
}

5.3 ik_max_word 分词

{
	"analyzer":"ik_max_word",
	"text":"人生就是一场戏, 因为有缘才相聚."
}

{
    "tokens": [
        {
            "token": "人生",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "生就",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "就是",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "一场",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "一",
            "start_offset": 4,
            "end_offset": 5,
            "type": "TYPE_CNUM",
            "position": 4
        },
        {
            "token": "场",
            "start_offset": 5,
            "end_offset": 6,
            "type": "COUNT",
            "position": 5
        },
        {
            "token": "戏",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 6
        },
        {
            "token": "因为",
            "start_offset": 9,
            "end_offset": 11,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "有缘",
            "start_offset": 11,
            "end_offset": 13,
            "type": "CN_WORD",
            "position": 8
        },
        {
            "token": "才",
            "start_offset": 13,
            "end_offset": 14,
            "type": "CN_CHAR",
            "position": 9
        },
        {
            "token": "相聚",
            "start_offset": 14,
            "end_offset": 16,
            "type": "CN_WORD",
            "position": 10
        }
    ]
}

6. 自定义词库

假设 景移乡 是一个网络词汇. 不想将其分词.
找到 analysis-ik 中的 config 文件夹.

ES安装ik中文分词器_自定义_09

添加一个自定义的文件: ykenan.dic, 内容就是 景移乡.
修改 IKAnalyzer.cfg.xml

ES安装ik中文分词器_elasticsearch_10

重启服务再次访问.

ES安装ik中文分词器_自定义_11

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：vue less 查看最新版本

下一篇：iOS按钮切换动画

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

ES安装ik中文分词器

ES安装ik中文分词器

Windows 安装 IK 分词器 ---- ElasticSearch 7.X

1. 得到需要移动的 zip 压缩包

1.1 方法一: 通过打包得到

1.1.1 下载并解压包

1.1.2 打包

1.2 方法二: 直接下载

3. 移动文件

4. 重新启动 ElasticSearch

5. 效果

5.1 正常分词

5.2 ik_smart 分词

5.3 ik_max_word 分词

6. 自定义词库

51CTO博客