Improving wideband acoustic models using mixed-bandwidth training data via DNN adaptation

You, Zhao; Xu, Bo

doi:10.21437/Interspeech.2014-493

In the past few years, deep neural networks (DNNs) have achieved great successes in speech recognition. The deep network model can be viewed as a series of feature transforms followed by a log-linear classifier. For input of speeches from different bandwidths, although the hidden layer transform and log-linear classification can be shared, the input layer transforms should be specially designed respectively. So, training DNNs directly on different bandwidth speeches is intractable. In this paper, we treat the problem of training DNNs on mixed bandwidth data as an domain-adaptation problem. Upon our adaptation approach, DNNs trainied on the rich narrowband speech can be adapted effectively to the target wideband domain, and meanwhile shows good performance on the wideband speech. We evaluate this approach on the wideband clean7k and noise360 speech. Experimental results show that the DNNs adaptation approach can reduce character error rate (CER) range from 5% to 15%, relatively, over the baseline DNNs trained only on the limited wideband data.

Improving wideband acoustic models using mixed-bandwidth training data via DNN adaptation

Zhao You, Bo Xu