ISCA Archive - Shallow-Fusion End-to-End Contextual Biasing
ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

Shallow-Fusion End-to-End Contextual Biasing

Ding Zhao, Tara N. Sainath, David Rybach, Pat Rondon, Deepti Bhatia, Bo Li, Ruoming Pang

Contextual biasing to a specific domain, including a user’s song names, app names and contact names, is an important component of any production-level automatic speech recognition (ASR) system. Contextual biasing is particularly challenging in end-to-end models because these models keep a small list of candidates during beam search, and also do poorly on proper nouns, which is the main source of biasing phrases. In this paper, we present various algorithmic and training improvements to shallow-fusion-based biasing for end-to-end models. We will show that the proposed approach obtains better performance than a state-of-the-art conventional model across a variety of tasks, the first time this has been demonstrated.