ISCA Archive - External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge
ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge

Guolong Zhong, Hongyu Song, Ruoyu Wang, Lei Sun, Diyuan Liu, Jia Pan, Xin Fang, Jun Du, Jie Zhang, Lirong Dai

This paper describes our USTC_NELSLIP system submitted to the Open Automatic Speech Recognition (OpenASR21) Challenge for the Constrained condition, where only a 10-hour speech dataset is allowed for training while additional text data is unlimited. To improve the low-resource speech recognition performance, we collect external text data for language modeling and train a text-to-speech (TTS) model to generate speech-text paired data. Our system is then built based on the conventional hybrid structure, where various subsystems are developed using different acoustic neural network architectures and different data augmentation methods. Finally, system fusion is employed to obtain the final result. Experiments on the OpenASR21 challenge show that the proposed system achieves the best performance for all testing languages.