# analysis_ik-8.12.2
**Repository Path**: wu_wanran/analysis_ik-8.12.2
## Basic Information
- **Project Name**: analysis_ik-8.12.2
- **Description**: es的ik分词器基于8.12.2版本
增加热加载词库功能
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-07-26
- **Last Updated**: 2024-07-31
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# 热加载词库
## 修调整elasticsearch.version
> 本次下载的是8.14.2版本,但是下载下来后是8.12.2版本,因为我用的是8.14.2版本的ES这里需要改一下版本

## 在pom文件中添加mysql驱动器依赖
```
mysql
mysql-connector-java
8.0.29
```
## 创建数据库配置文件
>jdbc-reload.properties,放在IK项目的config文件夹下
```
jdbc.url=jdbc:mysql://10.xxx.xxx.XX:3306/toptool?serverTimezone=UTC
jdbc.user=ueranme
jdbc.password=password
jdbc.reload.extend.sql=select word from es_extra_main
jdbc.reload.stop.sql=select word from es_extra_stopword
# 间隔时间 毫秒
jdbc.reload.interval=10000
```
## 新增HotDictReloadThread类
>在org.wltea.analyzer.dic目录下,新增一个HotDictReloadThread类,死循环去调用Dictionary.getSingleton().reLoadMainDict(),重新加载词典,HotDictReloadThread结构如下:
```
package org.wltea.analyzer.dic;
import org.wltea.analyzer.help.ESPluginLoggerFactory;
public class HotDictReloadThread implements Runnable {
private static final org.apache.logging.log4j.Logger logger = ESPluginLoggerFactory.getLogger(HotDictReloadThread.class.getName());
@Override
public void run() {
while (true) {
logger.info("[======HotDictReloadThread======] begin to reload hot dict from dataBase......");
Dictionary.getSingleton().reLoadMainDict();
}
}
}
```
## 修改文件
>看下reLoadMainDict这个方法的执行逻辑

>reLoadMainDict方法的核心逻辑有两块:
>>tmpDict.loadMainDict():加载主词库
>>tmpDict.loadStopWordDict():加载停用词词库
>
>由此可以看出,只要我们把读取数据库的逻辑放到这两个方法里面就可以了,下面我们分别来修改这两个方法。
>修改org.wltea.analyzer.dic.Dictionary#loadMainDict方法
通过修改loadMainDict来读取MySQL中的主词库,来实现热加载,修改的点如下:
>增加this.loadMySQLExtDict(),该方法就是将MySQL表中的数据加载到词库中

>loadMySQLExtDict()方法的主体逻辑是通过JDBC查询MySQL
```
private static Properties prop = new Properties();
static {
try {
Class.forName("com.mysql.cj.jdbc.Driver");
} catch (ClassNotFoundException e) {
logger.error("error", e);
}
}
/**
* 从mysql加载热更新词典
*/
private void loadMySQLExtDict() {
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
try {
Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties");
prop.load(new FileInputStream(file.toFile()));
logger.info("[==========]jdbc-reload.properties");
for(Object key : prop.keySet()) {
logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key)));
}
logger.info("[==========]query hot dict from mysql, " + prop.getProperty("jdbc.reload.extend.sql") + "......");
conn = DriverManager.getConnection(
prop.getProperty("jdbc.url"),
prop.getProperty("jdbc.user"),
prop.getProperty("jdbc.password"));
stmt = conn.createStatement();
rs = stmt.executeQuery(prop.getProperty("jdbc.reload.extend.sql"));
while(rs.next()) {
String theWord = rs.getString("word");
logger.info("[==========]hot word from mysql: " + theWord);
_MainDict.fillSegment(theWord.trim().toCharArray());
}
Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval"))));
} catch (Exception e) {
logger.error("erorr", e);
} finally {
if(rs != null) {
try {
rs.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if(stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if(conn != null) {
try {
conn.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
}
}
```
>修改org.wltea.analyzer.dic.Dictionary#loadStopWordDict方法
在loadStopWordDict方法中增加loadMySQLStopWordDict,该方法实现从MySQL中加载停用词到词典中

>loadMySQLStopWordDict实现代码为:
```
/**
* 从mysql加载停用词
*/
private void loadMySQLStopWordDict() {
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
try {
Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties");
prop.load(new FileInputStream(file.toFile()));
logger.info("[====loadMySQLStopWordDict======] jdbc-reload.properties");
for(Object key : prop.keySet()) {
logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key)));
}
logger.info("[==========]query hot stop word dict from mysql, " + prop.getProperty("jdbc.reload.stop.sql") + "......");
conn = DriverManager.getConnection(
prop.getProperty("jdbc.url"),
prop.getProperty("jdbc.user"),
prop.getProperty("jdbc.password"));
stmt = conn.createStatement();
rs = stmt.executeQuery(prop.getProperty("jdbc.reload.stop.sql"));
while(rs.next()) {
String theWord = rs.getString("word");
logger.info("[==========]hot stop word from mysql: " + theWord);
_StopWords.fillSegment(theWord.trim().toCharArray());
}
Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval"))));
} catch (Exception e) {
logger.error("error", e);
} finally {
if(rs != null) {
try {
rs.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if(stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if(conn != null) {
try {
conn.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
}
}
```
>org.wltea.analyzer.dic.Dictionary#initial调用HotDictReloadThread方法

## 修改权限
>不修改权限无法链接数据库

>如果还不行,就修改内置jdk的权限
> 文件位置/usr/share/elasticsearch/jdk/conf/security
> 增加代码
```
permission java.lang.RuntimePermission "setContextClassLoader";
permission java.net.SocketPermission "*", "connect,resolve";
```

## 建表脚本
```
CREATE TABLE `es_extra_main`
(
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
`word` varchar(255) CHARACTER SET utf8mb4 NOT NULL COMMENT '词',
`is_deleted` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否已删除',
`update_time` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP (6) COMMENT '更新时间',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `es_extra_stopword`
(
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
`word` varchar(255) CHARACTER SET utf8mb4 NOT NULL COMMENT '词',
`is_deleted` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否已删除',
`update_time` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP (6) COMMENT '更新时间',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
```
## 打包
>将MySQL的jar包依赖加入进来,否则打包会缺少jar包保持错。
> 
```
mysql:mysql-connector-java
```
> package
> 打包成zip文件,然后加压成文件夹
> 
## 安装
将elasticsearch-analysis-ik-8.14.2.zip文件拷贝到/usr/share/elasticsearch/plugins/elasticsearch-analysis-ik-8.14.2目录下:
## 启动
>windows 启动;直接双击bin目录下的elasticsearch.bat
# 自定义分词器解析规则
## 在创建索引时设置分词器
```
{
"mappings": {//创建映射
"properties": {//映射属性
"threeToOne": {//属性名
"type": "text",//属性类型
"analyzer": "my_ik_smart_analyzer"//分词器
}
}
},
"settings":{//设置
"analysis":{//解析设置
"analyzer":{//解析器
"my_ik_smart_analyzer":{//自定义解析器名称
"type":"ik_smart",//解析器类型
"enable_lowercase":false//配置信息:是否启用大写转小写
}
}
}
}
}
```