Phone recognition from ultrasound and optical video sequences for a silent speech interface

Hueber, Thomas; Chollet, Gérard; Denby, Bruce; Dreyfus, Gérard; Stone, Maureen

doi:10.21437/Interspeech.2008-528

Latest results on continuous speech phone recognition from video observations of the tongue and lips are described in the context of an ultrasound-based silent speech interface. The study is based on a new 61-minute audiovisual database containing ultrasound sequences of the tongue as well as both frontal and lateral view of the speaker's lips. Phonetically balanced and exhibiting good diphone coverage, this database is designed both for recognition and corpus-based synthesis purposes. Acoustic waveforms are phonetically labeled, and visual sequences coded using PCA-based robust feature extraction techniques. Visual and acoustic observations of each phonetic class are modeled by continuous HMMs, allowing the performance of the visual phone recognizer to be compared to a traditional acoustic-based phone recognition experiment. The phone recognition confusion matrix is also discussed in detail.

Phone recognition from ultrasound and optical video sequences for a silent speech interface

Thomas Hueber, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone