# nlp

**Repository Path**: wang525976683/attention

## Basic Information

- **Project Name**: nlp
- **Description**: 使用attention结构训练一个英文翻译中文的简易模型
本代码参考了以下链接:
https://wmathor.com/index.php/archives/1438/
https://wmathor.com/index.php/archives/1455/
- **Primary Language**: Python
- **License**: MulanPSL-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 2
- **Forks**: 3
- **Created**: 2022-11-29
- **Last Updated**: 2023-11-18

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# nlp

#### 介绍
使用attention结构训练一个英文翻译中文的简易模型。做一个nlp的小demo。

 **版本** 
  python 3.7
  pytorch ：1.6.0
  sentencepiece : 4.0.2


在data目录下提供了2万6千条英文以及对应翻译的中文。
![输入图片说明](https://foruda.gitee.com/images/1669707764911320555/a2f1eb08_8854060.png "屏幕截图")

###  首先第一步是对数据的预处理。
(https://gitee.com/wang525976683/attention/blob/master/nlp/step_01_ETL.py)

 **第一步是将中英文进行分词操作** 
这里使用了sentencepiece包进行分词模型的训练，最终转化之后如下图。方法:split_word()

![输入图片说明](https://foruda.gitee.com/images/1669708177906936975/192de73c_8854060.png "屏幕截图")

 **第二步将分词后的中英文转换成数字形式** 

方法:mk_dict() 和 text2number()

![输入图片说明](https://foruda.gitee.com/images/1669709598943745572/8be4b91f_8854060.png "屏幕截图")

### 之后进行模型训练阶段

 **本样例采用Transformer结构** 

![输入图片说明](nlp/img/%E5%9B%BE%E7%89%87.png)


 **训练数据loss** 

![输入图片说明](nlp/img/loss%E5%9B%BE%E7%89%87.png)

 **翻译效果展示** 


![输入图片说明](nlp/img/%E7%BF%BB%E8%AF%91%E7%BB%93%E6%9E%9C%E5%B1%95%E7%A4%BA%E5%9B%BE%E7%89%87.png)





### 参考链接

https://wmathor.com/index.php/archives/1438/

https://wmathor.com/index.php/archives/1455/