Windows 安装 IK 分词器 ---- ElasticSearch 7.X
- 1. 得到需要移动的 zip 压缩包
- 1.1 方法一: 通过打包得到
- 1.1.1 下载并解压包
- 1.1.2 打包
- 1.2 方法二: 直接下载
- 3. 移动文件
- 4. 重新启动 ElasticSearch
- 5. 效果
- 5.1 正常分词
- 5.2 ik_smart 分词
- 5.3 ik_max_word 分词
- 6. 自定义词库
1. 得到需要移动的 zip 压缩包
1.1 方法一: 通过打包得到
1.1.1 下载并解压包
解压路径: D:\install\ElasticSearch
1.1.2 打包
Window + R 后输入 cmd.
进入解压后的路径后进行打包.
D:
cd D:\install\ElasticSearch\elasticsearch-analysis-ik-7.6.2
mvn package
上面的发现一个问题: 版本是 7.6.2 压缩后变成了 7.4.0 的, 所以用不了闪退. 看 pom.xml 文件, 再回头看下载的网页发现, 坑啊!
1.2 方法二: 直接下载
直接下载比较慢. 但不用打包.
下载连接:
https://github.com/medcl/elasticsearch-analysis-ik/releases
3. 移动文件
方法一: 进入
target/releases
将zip
包拷贝到下面新建立的analysis-ik
文件夹下, 之后将zip
压缩包解压.方法二: 直接将下载到的
zip
包拷贝到下面新建立的analysis-ik
文件夹下, 之后将zip
压缩包解压.在你所安装 ES 的所在目录下的的
plugins
下创建analysis-ik
文件夹.
4. 重新启动 ElasticSearch
5. 效果
IK 分词器支持
ik_smart
与ik_max_word
两种分词.
ik_smart: 大体分词.
ik_max_word: 细分词.方法:
GET
连接:http://localhost:9200/_analyze
http://localhost:9200/_analyze
5.1 正常分词
{
"text":"人生就是一场戏, 因为有缘才相聚."
}
结果:
{
"tokens": [
{
"token": "人",
"start_offset": 0,
"end_offset": 1,
"type": "<IDEOGRAPHIC>",
"position": 0
},
{
"token": "生",
"start_offset": 1,
"end_offset": 2,
"type": "<IDEOGRAPHIC>",
"position": 1
},
{
"token": "就",
"start_offset": 2,
"end_offset": 3,
"type": "<IDEOGRAPHIC>",
"position": 2
},
{
"token": "是",
"start_offset": 3,
"end_offset": 4,
"type": "<IDEOGRAPHIC>",
"position": 3
},
{
"token": "一",
"start_offset": 4,
"end_offset": 5,
"type": "<IDEOGRAPHIC>",
"position": 4
},
{
"token": "场",
"start_offset": 5,
"end_offset": 6,
"type": "<IDEOGRAPHIC>",
"position": 5
},
{
"token": "戏",
"start_offset": 6,
"end_offset": 7,
"type": "<IDEOGRAPHIC>",
"position": 6
},
{
"token": "因",
"start_offset": 9,
"end_offset": 10,
"type": "<IDEOGRAPHIC>",
"position": 7
},
{
"token": "为",
"start_offset": 10,
"end_offset": 11,
"type": "<IDEOGRAPHIC>",
"position": 8
},
{
"token": "有",
"start_offset": 11,
"end_offset": 12,
"type": "<IDEOGRAPHIC>",
"position": 9
},
{
"token": "缘",
"start_offset": 12,
"end_offset": 13,
"type": "<IDEOGRAPHIC>",
"position": 10
},
{
"token": "才",
"start_offset": 13,
"end_offset": 14,
"type": "<IDEOGRAPHIC>",
"position": 11
},
{
"token": "相",
"start_offset": 14,
"end_offset": 15,
"type": "<IDEOGRAPHIC>",
"position": 12
},
{
"token": "聚",
"start_offset": 15,
"end_offset": 16,
"type": "<IDEOGRAPHIC>",
"position": 13
}
]
}
5.2 ik_smart 分词
{
"analyzer":"ik_smart",
"text":"人生就是一场戏, 因为有缘才相聚."
}
结果:
{
"tokens": [
{
"token": "人生",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "就是",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "一场",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 2
},
{
"token": "戏",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 3
},
{
"token": "因为",
"start_offset": 9,
"end_offset": 11,
"type": "CN_WORD",
"position": 4
},
{
"token": "有缘",
"start_offset": 11,
"end_offset": 13,
"type": "CN_WORD",
"position": 5
},
{
"token": "才",
"start_offset": 13,
"end_offset": 14,
"type": "CN_CHAR",
"position": 6
},
{
"token": "相聚",
"start_offset": 14,
"end_offset": 16,
"type": "CN_WORD",
"position": 7
}
]
}
5.3 ik_max_word 分词
{
"analyzer":"ik_max_word",
"text":"人生就是一场戏, 因为有缘才相聚."
}
{
"tokens": [
{
"token": "人生",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "生就",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 1
},
{
"token": "就是",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 2
},
{
"token": "一场",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 3
},
{
"token": "一",
"start_offset": 4,
"end_offset": 5,
"type": "TYPE_CNUM",
"position": 4
},
{
"token": "场",
"start_offset": 5,
"end_offset": 6,
"type": "COUNT",
"position": 5
},
{
"token": "戏",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 6
},
{
"token": "因为",
"start_offset": 9,
"end_offset": 11,
"type": "CN_WORD",
"position": 7
},
{
"token": "有缘",
"start_offset": 11,
"end_offset": 13,
"type": "CN_WORD",
"position": 8
},
{
"token": "才",
"start_offset": 13,
"end_offset": 14,
"type": "CN_CHAR",
"position": 9
},
{
"token": "相聚",
"start_offset": 14,
"end_offset": 16,
"type": "CN_WORD",
"position": 10
}
]
}
6. 自定义词库
假设景移乡
是一个网络词汇. 不想将其分词.找到 analysis-ik 中的
config
文件夹.
添加一个自定义的文件:
ykenan.dic
, 内容就是景移乡
.
修改IKAnalyzer.cfg.xml
重启服务再次访问.