# analysis_ik-8.12.2 **Repository Path**: wu_wanran/analysis_ik-8.12.2 ## Basic Information - **Project Name**: analysis_ik-8.12.2 - **Description**: es的ik分词器基于8.12.2版本 增加热加载词库功能 - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-07-26 - **Last Updated**: 2024-07-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 热加载词库 ## 修调整elasticsearch.version > 本次下载的是8.14.2版本,但是下载下来后是8.12.2版本,因为我用的是8.14.2版本的ES这里需要改一下版本 ![img.png](img.png) ## 在pom文件中添加mysql驱动器依赖 ``` mysql mysql-connector-java 8.0.29 ``` ## 创建数据库配置文件 >jdbc-reload.properties,放在IK项目的config文件夹下 ``` jdbc.url=jdbc:mysql://10.xxx.xxx.XX:3306/toptool?serverTimezone=UTC jdbc.user=ueranme jdbc.password=password jdbc.reload.extend.sql=select word from es_extra_main jdbc.reload.stop.sql=select word from es_extra_stopword # 间隔时间 毫秒 jdbc.reload.interval=10000 ``` ## 新增HotDictReloadThread类 >在org.wltea.analyzer.dic目录下,新增一个HotDictReloadThread类,死循环去调用Dictionary.getSingleton().reLoadMainDict(),重新加载词典,HotDictReloadThread结构如下: ``` package org.wltea.analyzer.dic; import org.wltea.analyzer.help.ESPluginLoggerFactory; public class HotDictReloadThread implements Runnable { private static final org.apache.logging.log4j.Logger logger = ESPluginLoggerFactory.getLogger(HotDictReloadThread.class.getName()); @Override public void run() { while (true) { logger.info("[======HotDictReloadThread======] begin to reload hot dict from dataBase......"); Dictionary.getSingleton().reLoadMainDict(); } } } ``` ## 修改文件 >看下reLoadMainDict这个方法的执行逻辑 ![img_1.png](img_1.png) >reLoadMainDict方法的核心逻辑有两块: >>tmpDict.loadMainDict():加载主词库 >>tmpDict.loadStopWordDict():加载停用词词库 > >由此可以看出,只要我们把读取数据库的逻辑放到这两个方法里面就可以了,下面我们分别来修改这两个方法。 >修改org.wltea.analyzer.dic.Dictionary#loadMainDict方法 通过修改loadMainDict来读取MySQL中的主词库,来实现热加载,修改的点如下: >增加this.loadMySQLExtDict(),该方法就是将MySQL表中的数据加载到词库中 ![img_2.png](img_2.png) >loadMySQLExtDict()方法的主体逻辑是通过JDBC查询MySQL ``` private static Properties prop = new Properties(); static { try { Class.forName("com.mysql.cj.jdbc.Driver"); } catch (ClassNotFoundException e) { logger.error("error", e); } } /** * 从mysql加载热更新词典 */ private void loadMySQLExtDict() { Connection conn = null; Statement stmt = null; ResultSet rs = null; try { Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties"); prop.load(new FileInputStream(file.toFile())); logger.info("[==========]jdbc-reload.properties"); for(Object key : prop.keySet()) { logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key))); } logger.info("[==========]query hot dict from mysql, " + prop.getProperty("jdbc.reload.extend.sql") + "......"); conn = DriverManager.getConnection( prop.getProperty("jdbc.url"), prop.getProperty("jdbc.user"), prop.getProperty("jdbc.password")); stmt = conn.createStatement(); rs = stmt.executeQuery(prop.getProperty("jdbc.reload.extend.sql")); while(rs.next()) { String theWord = rs.getString("word"); logger.info("[==========]hot word from mysql: " + theWord); _MainDict.fillSegment(theWord.trim().toCharArray()); } Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval")))); } catch (Exception e) { logger.error("erorr", e); } finally { if(rs != null) { try { rs.close(); } catch (SQLException e) { logger.error("error", e); } } if(stmt != null) { try { stmt.close(); } catch (SQLException e) { logger.error("error", e); } } if(conn != null) { try { conn.close(); } catch (SQLException e) { logger.error("error", e); } } } } ``` >修改org.wltea.analyzer.dic.Dictionary#loadStopWordDict方法 在loadStopWordDict方法中增加loadMySQLStopWordDict,该方法实现从MySQL中加载停用词到词典中 ![img_3.png](img_3.png) >loadMySQLStopWordDict实现代码为: ``` /** * 从mysql加载停用词 */ private void loadMySQLStopWordDict() { Connection conn = null; Statement stmt = null; ResultSet rs = null; try { Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties"); prop.load(new FileInputStream(file.toFile())); logger.info("[====loadMySQLStopWordDict======] jdbc-reload.properties"); for(Object key : prop.keySet()) { logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key))); } logger.info("[==========]query hot stop word dict from mysql, " + prop.getProperty("jdbc.reload.stop.sql") + "......"); conn = DriverManager.getConnection( prop.getProperty("jdbc.url"), prop.getProperty("jdbc.user"), prop.getProperty("jdbc.password")); stmt = conn.createStatement(); rs = stmt.executeQuery(prop.getProperty("jdbc.reload.stop.sql")); while(rs.next()) { String theWord = rs.getString("word"); logger.info("[==========]hot stop word from mysql: " + theWord); _StopWords.fillSegment(theWord.trim().toCharArray()); } Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval")))); } catch (Exception e) { logger.error("error", e); } finally { if(rs != null) { try { rs.close(); } catch (SQLException e) { logger.error("error", e); } } if(stmt != null) { try { stmt.close(); } catch (SQLException e) { logger.error("error", e); } } if(conn != null) { try { conn.close(); } catch (SQLException e) { logger.error("error", e); } } } } ``` >org.wltea.analyzer.dic.Dictionary#initial调用HotDictReloadThread方法 ![img_4.png](img_4.png) ## 修改权限 >不修改权限无法链接数据库 ![img_5.png](img_5.png) >如果还不行,就修改内置jdk的权限 > 文件位置/usr/share/elasticsearch/jdk/conf/security > 增加代码 ``` permission java.lang.RuntimePermission "setContextClassLoader"; permission java.net.SocketPermission "*", "connect,resolve"; ``` ![img_6.png](img_6.png) ## 建表脚本 ``` CREATE TABLE `es_extra_main` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键', `word` varchar(255) CHARACTER SET utf8mb4 NOT NULL COMMENT '词', `is_deleted` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否已删除', `update_time` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP (6) COMMENT '更新时间', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; CREATE TABLE `es_extra_stopword` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键', `word` varchar(255) CHARACTER SET utf8mb4 NOT NULL COMMENT '词', `is_deleted` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否已删除', `update_time` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP (6) COMMENT '更新时间', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; ``` ## 打包 >将MySQL的jar包依赖加入进来,否则打包会缺少jar包保持错。 > ![img_7.png](img_7.png) ``` mysql:mysql-connector-java ``` > package > 打包成zip文件,然后加压成文件夹 > ![img_8.png](img_8.png) ## 安装 将elasticsearch-analysis-ik-8.14.2.zip文件拷贝到/usr/share/elasticsearch/plugins/elasticsearch-analysis-ik-8.14.2目录下: ## 启动 >windows 启动;直接双击bin目录下的elasticsearch.bat # 自定义分词器解析规则 ## 在创建索引时设置分词器 ``` { "mappings": {//创建映射 "properties": {//映射属性 "threeToOne": {//属性名 "type": "text",//属性类型 "analyzer": "my_ik_smart_analyzer"//分词器 } } }, "settings":{//设置 "analysis":{//解析设置 "analyzer":{//解析器 "my_ik_smart_analyzer":{//自定义解析器名称 "type":"ik_smart",//解析器类型 "enable_lowercase":false//配置信息:是否启用大写转小写 } } } } } ```