Abstract
Many discriminative classification algorithms are designed for tasks where samples can be represented by fixed-length vectors. However, many examples in the fields of text processing, computational biology and speech recognition are best represented as variable-length sequences of vectors. Although several dynamic kernels have been proposed for mapping sequences of discrete observations into fixed-dimensional feature-spaces, few kernels exist for sequences of continuous observations. This paper introduces continuous rational kernels, an extension of standard rational kernels, as a general framework for classifying sequences of continuous observations. In addition to allowing new task-dependent kernels to be defined, continuous rational kernels allow existing continuous dynamic kernels, such as Fisher and generative kernels, to be calculated using standard weighted finite-state transducer algorithms. Preliminary results on both a large vocabulary continuous speech recognition (LVCSR) task and the TIMIT database are presented.
Similar content being viewed by others
References
L.A. Rabiner, “A Tutorial on Hidden Markov Models and Selective Applications in Speech Recognition,” in Proc. of the IEEE, vol. 77, 1989, pp. 257-286, February.
D. Povey, Discriminative Training for Large Vocabulary Speech Recognition, Ph.D. thesis, University of Cambridge, July 2004.
V.N. Vapnik, Statistical Learning Theory, Wiley, 1998.
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, “Text Classification Using String Kernels,” J. Mach. Learn. Res., vol. 2, 2002, pp. 419–444.
K. Tsuda, T. Kin, and K. Asai, “Marginalized Kernels for Biological Sequences,” Bioinformatics, vol. 18, 2002, pp. S268–S275.
T. Jaakkola and D. Hausser, “Exploiting Generative Models in Disciminative Classifiers,” in Advances in Neural Information Processing Systems 11, S.A. Solla and D.A. Cohn (Eds.), MIT, 1999, pp. 487–493.
N. Smith and M. Gales, “Speech Recognition using SVMs,” in Advances in Neural Information Processing Systems 14, T.G. Dietterich, S. Becker, and Z. Ghahramani (Eds.), MIT, 2002, pp. 1197–1204.
C. Cortes, P. Haffner, and M. Mohri, “Positive Definite Rational Kernels,” in 16th Annual Conference on Computational Learning Theory (COLT 2003), Washington DC, August 2003, pp. 656–670.
C. Cortes, P. Haffner, and M. Mohri, “Rational Kernels: Theory and Algorithms,” J. Mach. Learn. Res., vol. 5, 2004, pp. 1035–1062.
M. Mohri, F. Pereira, and M. Riley, “Weighted Finite-state Transducers in Speech Recognition,” Comput. Speech Lang., vol. 16, 2002, pp. 69–88, January.
F.C.N. Pereira and M.D. Riley, “Speech Recognition by Composition of Weighted Finite Automata,” in Finite-State Devices for Natural Language Processing, E. Roche and Y. Schabes (Eds.), MIT, 1997.
J.S. Garofolo et al., DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM, 1993.
N.D. Smith and M.J.F. Gales, “Using SVMs to Classify Variable Length Speech Patterns,” Tech. Rep. CUED/F-INFENG/TR.412, Department of Engineering, University of Cambridge, April 2002.
M.I. Layton, Augmented Statistical Models for Classifying Sequence Data, Ph.D. thesis, University of Cambridge, September 2006.
F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Psychol. Rev., vol. 65, no. 6, 1958, pp. 386–408.
V. Venkataramani, S. Chakrabartty, and W. Byrne, “Support Vector Machines for Segmental Minimum Bayes Risk Decoding of Continuous Speech,” in ASRU 2003, 2003, pp. 13–18.
M. Mohri, “Finite-state Transducers in Language and Speech Processing,” Comput. Linguist., vol. 23, no. 2, 1997, pp. 269–311.
M. Mohri, “Semiring Frameworks and Algorithms for Shortest-distance Problems,” J. Autom. Lang. Comb., vol. 7, 2002, pp. 321–350.
J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004.
L. E. Baum and J. A. Eagon, “An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology,” Bull. Am. Math. Soc., vol. 73, 1967, pp. 360–363.
L.R. Bahl, P. Brown, P. de Souza, and R. Mercer, “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,” in Proc. ICASSP, Tokyo, 1986.
O. Cappé, E. Moulines, and T. Rydén, Inference in Hidden Markov Models, Springer, 2005, Springer Series in Statistics.
G. Evermann, H.Y. Chan, M.J.F. Gales, B. Jia, D. Mrva, P.C. Woodland, and K. Yu, “Training LVCSR Systems on Thousands of Hours of Data,” in Proc. ICASSP, 2005, pp. 209–212.
L. Mangu, E. Brill, and A. Stolcke, “Finding Consensus among Words: Lattice-based Word Error Minimization,” in Proc. Eurospeech, 1999, pp. 495–498.
N.D. Smith, Using Augmented Statistical Models and Score Spaces for Classification, Ph.D. thesis, University of Cambridge, September 2003.
A. Gunawardana, M. Mahajan, A. Acero, and J.C. Platt, “Hidden Conditional Random Fields for Phone Classification,” in Interspeech, 2005.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Layton, M., Gales, M. Acoustic Modelling Using Continuous Rational Kernels. J VLSI Sign Process Syst Sign Im 48, 67–82 (2007). https://doi.org/10.1007/s11265-006-0027-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-006-0027-4