
一介闲人
一介闲人
analysis-icu功能:
analysis-icu应用场景:
查看已安装的插件
cd /home/es/elasticsearch-8.17.0
bin/elasticsearch-plugin list
在线安装插件
cd /home/es/elasticsearch-8.17.0
bin/elasticsearch-plugin install analysis-icu
删除插件
cd /home/es/elasticsearch-8.17.0
bin/elasticsearch-plugin remove analysis-icu
测试分词器
POST _analyze
{
"analyzer":"icu_analyzer",
"text":"中华人民共和国"
}
测试结果
{
"tokens": [
{
"token":"中华",
"start_offset": 0,
"end_offset": 2,
"type": "<IDEOGRAPHIC>",
"position":0
}
,
{
"token":"人民",
"start_offset": 2,
"end_offset": 4,
"type":"<IDEOGRAPHIC>",
"position":1
}
,
{
"token":"共和国",
"start_offset": 4,
"end_offset": 7,
"type":"<IDEOGRAPHIC>",
"position":2
}
]
}
手动下载安装包,上传到elasticsearch安装目录下的plugins目录,然后重启elasticsearch实例即可。
IK中文分词器插件源码地址:https://github.com/infinilabs/analysis-ik
IK分词器插件版本必须与ElasticSerach版本一一对应,否则会出现兼容性问题,导致ElasticSerach启动失败。
此次以8.17.0版本,如果源码地址中最新版本没找到对应的版本,可以到下面这个地址下载:https://release.infinilabs.com/analysis-ik/stable/
standard模式
# 默认的分词器模式:standard,会单字拆分
POST _analyze
{
"analyzer":"standard",
"text":"中华人民共和国"
}
# 测试结果
{
"tokens": [
{
"token":"中",
"start_offset": 0,
"end_offset": 1,
"type": "<IDEOGRAPHIC>",
"position":0
}
,
{
"token":"华",
"start_offset": 1,
"end_offset": 2,
"type": "<IDEOGRAPHIC>",
"position":1
}
,{
"token":"人",
"start_offset": 2,
"end_offset": 3,
"type": "<IDEOGRAPHIC>",
"position":2
}
,{
"token":"民",
"start_offset": 3,
"end_offset": 4,
"type": "<IDEOGRAPHIC>",
"position":3
}
,{
"token":"共",
"start_offset": 4,
"end_offset": 5,
"type": "<IDEOGRAPHIC>",
"position":4
}
,{
"token":"和",
"start_offset": 5,
"end_offset": 6,
"type": "<IDEOGRAPHIC>",
"position":5
}
,{
"token":"国",
"start_offset": 6,
"end_offset": 7,
"type": "<IDEOGRAPHIC>",
"position":6
}
]
}
ik_smart模式
# 分词器模式:ik_smart,会做最粗粒度的拆分,适用于做标签场景
POST _analyze
{
"analyzer":"ik_smart",
"text":"中华人民共和国"
}
# 测试结果
{
"tokens": [
{
"token":"中华人民共和国",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position":0
}
]
}
#############################################
POST _analyze
{
"analyzer":"ik_smart",
"text":"中华渔船"
}
# 测试结果
{
"tokens": [
{
"token":"中华",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position":0
}
,
{
"token":"渔船",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position":1
}
]
}
ik_max_word模式
# 分词器模式:ik_max_word,会做最细粒度的拆分,适用于做模糊查询匹配场景等
POST _analyze
{
"analyzer":"ik_max_word",
"text":"中华人民共和国"
}
# 测试结果
{
"tokens": [
{
"token":"中华人民共和国",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position":0
}
,
{
"token":"中华人民",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position":1
}
,{
"token":"中华",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position":2
}
,{
"token":"华人",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position":3
}
,{
"token":"人民共和国",
"start_offset": 2,
"end_offset": 7,
"type": "CN_WORD",
"position":4
}
,{
"token":"人民",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position":5
}
,
{
"token":"共和国",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position":6
}
,{
"token":"共和",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position":7
}
,{
"token":"国",
"start_offset": 6,
"end_offset": 7,
"type": "CN_WORD",
"position":8
}
,
]
}
评论