ISCA Archive - Placing structuring elements in a word sequence for generating new statistical language models
ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Placing structuring elements in a word sequence for generating new statistical language models

Karl Weilhammer, Günther Ruske

Class based n-gram language models have been applied successfully in speech technology. We will present an automatic method to improve n-gram language models by distributing structural elements in a new way in word sequences. Our algorithm works on textual data consisting of two different kinds of text elements, namely words and structural elements. The order of words will not be changed during the iterations. Only structural elements can be inserted or deleted by the algorithm between any two items in the data. Thus unseen n-grams will be interpolated by n-grams containing structural elements. We give a detailed description of the algorithm and present first results of a system trained on a small corpus.