 
									一介闲人
 
							一介闲人
 
				analysis-icu功能:
analysis-icu应用场景:
查看已安装的插件
cd /home/es/elasticsearch-8.17.0
bin/elasticsearch-plugin list
在线安装插件
cd /home/es/elasticsearch-8.17.0
bin/elasticsearch-plugin  install  analysis-icu
删除插件
cd /home/es/elasticsearch-8.17.0
bin/elasticsearch-plugin  remove  analysis-icu
测试分词器
POST  _analyze
{
    "analyzer":"icu_analyzer",
    "text":"中华人民共和国"
}
测试结果
{
    "tokens": [
    	{
            "token":"中华",
            "start_offset": 0,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position":0
        }
        ,
        {
            "token":"人民",
            "start_offset": 2,
            "end_offset": 4,
            "type":"<IDEOGRAPHIC>",
            "position":1
        }
        ,
        {
            "token":"共和国",
            "start_offset": 4,
            "end_offset": 7,
            "type":"<IDEOGRAPHIC>",
            "position":2
        }
    ]
}
手动下载安装包,上传到elasticsearch安装目录下的plugins目录,然后重启elasticsearch实例即可。
IK中文分词器插件源码地址:https://github.com/infinilabs/analysis-ik
IK分词器插件版本必须与ElasticSerach版本一一对应,否则会出现兼容性问题,导致ElasticSerach启动失败。
此次以8.17.0版本,如果源码地址中最新版本没找到对应的版本,可以到下面这个地址下载:https://release.infinilabs.com/analysis-ik/stable/
standard模式
# 默认的分词器模式:standard,会单字拆分
POST  _analyze
{
    "analyzer":"standard",
    "text":"中华人民共和国"
}
# 测试结果
{
    "tokens": [
    	{
            "token":"中",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<IDEOGRAPHIC>",
            "position":0
        }
        ,
        {
            "token":"华",
            "start_offset": 1,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position":1
        }
        ,{
            "token":"人",
            "start_offset": 2,
            "end_offset": 3,
            "type": "<IDEOGRAPHIC>",
            "position":2
        }
        ,{
            "token":"民",
            "start_offset": 3,
            "end_offset": 4,
            "type": "<IDEOGRAPHIC>",
            "position":3
        }
        ,{
            "token":"共",
            "start_offset": 4,
            "end_offset": 5,
            "type": "<IDEOGRAPHIC>",
            "position":4
        }
        ,{
            "token":"和",
            "start_offset": 5,
            "end_offset": 6,
            "type": "<IDEOGRAPHIC>",
            "position":5
        }
        ,{
            "token":"国",
            "start_offset": 6,
            "end_offset": 7,
            "type": "<IDEOGRAPHIC>",
            "position":6
        }
    ]
}
ik_smart模式
# 分词器模式:ik_smart,会做最粗粒度的拆分,适用于做标签场景
POST  _analyze
{
    "analyzer":"ik_smart",
    "text":"中华人民共和国"
}
# 测试结果
{
    "tokens": [
    	{
            "token":"中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position":0
        }
    ]
}
#############################################
POST  _analyze
{
    "analyzer":"ik_smart",
    "text":"中华渔船"
}
# 测试结果
{
    "tokens": [
    	{
            "token":"中华",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position":0
        }
        ,
        {
            "token":"渔船",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position":1
        }
    ]
}
ik_max_word模式
# 分词器模式:ik_max_word,会做最细粒度的拆分,适用于做模糊查询匹配场景等
POST  _analyze
{
    "analyzer":"ik_max_word",
    "text":"中华人民共和国"
}
# 测试结果
{
    "tokens": [
    	{
            "token":"中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position":0
        }
        ,
        {
            "token":"中华人民",
            "start_offset": 0,
            "end_offset": 4,
            "type": "CN_WORD",
            "position":1
        }
        ,{
            "token":"中华",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position":2
        }
        ,{
            "token":"华人",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position":3
        }
        ,{
            "token":"人民共和国",
            "start_offset": 2,
            "end_offset": 7,
            "type": "CN_WORD",
            "position":4
        }
        ,{
            "token":"人民",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position":5
        }
        ,
        {
            "token":"共和国",
            "start_offset": 4,
            "end_offset": 7,
            "type": "CN_WORD",
            "position":6
        }
        ,{
            "token":"共和",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position":7
        }
        ,{
            "token":"国",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_WORD",
            "position":8
        }
        ,
    ]
}
 
									
评论