ISCA Archive - Relevance-weighted-reconstruction of articulatory features in deep-neural-network-based acoustic-to-articulatory mapping
ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Relevance-weighted-reconstruction of articulatory features in deep-neural-network-based acoustic-to-articulatory mapping

Claudia Canevari, Leonardo Badino, Luciano Fadiga, Giorgio Metta

We present a strategy for learning Deep-Neural-Network (DNN)- based Acoustic-to-Articulatory Mapping (AAM) functions where the contribution of an articulatory feature (AF) to the global recon- struction error is weighted by its relevance. We first empirically show that when an articulator is more crucial for the production of a given phone it is less variable, confirming previous findings. We then compute the relevance of an articulatory feature as a function of its frame-wise variance dependent on the acoustic evidence which is estimated through a Mixture Density Network (MDN). Finally we combine acoustic and recovered articulatory features in a hybrid DNN-HMM phone recognizer. Tested on the MOCHATIMIT corpus, articulatory features reconstructed by a standardly trained DNN lead to a 8.4% relative phone error reduction (w.r.t. a recognizer that only uses MFCCs), whereas when the articulatory features are reconstructed taking into account their relevance the relative phone error reduction increased to 10.9%.