18 Star 74 Fork 49

DeepSpark / DeepSparkHub

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
README.md 2.31 KB
一键复制 编辑 原始数据 按行查看 历史
majorli6 提交于 2022-12-09 03:43 . fix rnnt model running issue dependancy

RNN-T

Model description

Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

Step 1: Installing packages

Install required libraries:

bash install.sh

Step 2: Preparing datasets

download LibriSpeech http://www.openslr.org/12

bash scripts/download_librispeech.sh ${DATA_ROOT_DIR}

preprocess LibriSpeech

bash scripts/preprocess_librispeech.sh ${DATA_ROOT_DIR}

Step 3: Training

Setup config yaml

sed -i "s#MODIFY_DATASET_DIR#${DATA_ROOT_DIR}/LibriSpeech#g" configs/baseline_v3-1023sp.yaml

Multiple GPUs on one machine

mkdir -p output/
bash scripts/train_rnnt_1x8.sh output/ ${DATA_ROOT_DIR}/LibriSpeech

Following conditions were tested, you can run any of them below:

FP32
single card train_rnnt_1x1.sh
4 cards train_rnnt_1x4.sh
8 cards train_rnnt_1x8.sh

Results on BI-V100

GPUs FP16 FPS WER
1x8 False 20 0.058

Reference

https://github.com/mlcommons/training/tree/master/rnn_speech_recognition/pytorch

Python
1
https://gitee.com/deep-spark/deepsparkhub.git
git@gitee.com:deep-spark/deepsparkhub.git
deep-spark
deepsparkhub
DeepSparkHub
master

搜索帮助