ISCA Archive - Semi-Supervised Audio Classification with Consistency-Based Regularization
ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

Semi-Supervised Audio Classification with Consistency-Based Regularization

Kangkang Lu, Chuan-Sheng Foo, Kah Kuan Teh, Huy Dat Tran, Vijay Ramaseshan Chandrasekhar

Consistency-based semi-supervised learning methods such as the Mean Teacher method are state-of-the-art on image datasets, but have yet to be applied to audio data. Such methods encourage model predictions to be consistent on perturbed input data. In this paper, we incorporate audio-specific perturbations into the Mean Teacher algorithm and demonstrate the effectiveness of the resulting method on audio classification tasks. Specifically, we perturb audio inputs by mixing in other environmental audio clips, and leverage other training examples as sources of noise. Experiments on the Google Speech Command Dataset and UrbanSound8K Dataset show that the method can achieve comparable performance to a purely supervised approach while using only a fraction of the labels.