# 3m-asr **Repository Path**: RapidAI/3m-asr ## Basic Information - **Project Name**: 3m-asr - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-03-22 - **Last Updated**: 2023-03-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## 3M-ASR for End-to-End Speech Recognition This project is used to build an End-to-End Speech Recognition system based on Mixture-of-Experts(MoE) model. MoE is an efficient way to train a large scale model and we have proved its efficiency on public dataset. More details about the algorithm can be found in "[3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition](https://arxiv.org/abs/2204.03178)". ## Installation - Clone this repo ```shell git clone https://github.com/tencent-ailab/3m-asr.git ``` - Install Conda: please see [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html) - Create Conda env: ```shell conda create -n moe python=3.8 conda activate moe pip install -r requirements.txt conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge ``` - Follow the instruction under directory `fastmoe` to install `fastmoe` ## Performance Benchmark We evaluate our system on the public [WenetSpeech](https://github.com/wenet-e2e/WenetSpeech) dataset and the recipe of `Conformer-MoE` is provided. CER results are listed below and the first three lines are provided by [WenetSpeech](https://github.com/wenet-e2e/WenetSpeech) | Toolkit | Dev | Test_net | Test_Meeting | AIShell-1 | | :----------------: | :------: | :------: | :----------: | :-------: | | Kaldi | 9.07 | 12.83 | 24.72 | 5.41 | | Espnet | 9.70 | 8.90 | 15.90 | **3.90** | | WeNet | 8.88 | 9.70 | 15.59 | 4.61 | | Conformer-MoE(32e) | **7.49** | **7.99** | **13.69** | 4.03 | ## Acknowledge - We used [FastMoE](https://github.com/laekov/fastmoe) to support Mixture-of-Experts model training in Pytorch - We borrowed a lot of code from [WeNet](https://github.com/wenet-e2e/wenet) for the implementation of Conformer and data processing ## Reference [1] [SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts](https://arxiv.org/abs/2105.03036)(InterSpeech 2021) [2] [3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition](https://arxiv.org/abs/2204.03178)(Submitted to InterSpeech 2022) ## Citation ```tex @inproceedings{you21_interspeech, author={Zhao You and Shulin Feng and Dan Su and Dong Yu}, title={{SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={2077--2081}, doi={10.21437/Interspeech.2021-478} } @article{you20223m, title={3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition}, author={You, Zhao and Feng, Shulin and Su, Dan and Yu, Dong}, journal={arXiv preprint arXiv:2204.03178}, year={2022} } ``` ## Contact If you have any questions about this project, please feel free to contact shulinfeng@tencent.com or dennisyou@tencent.com ## Disclaimer This is not an officially supported Tencent product