default search action
INTERSPEECH 2002: Denver, Colorado, USA
- John H. L. Hansen, Bryan L. Pellom:
7th International Conference on Spoken Language Processing, ICSLP2002 - INTERSPEECH 2002, Denver, Colorado, USA, September 16-20, 2002. ISCA 2002
Keynotes
- W. Tecumseh Fitch:
The evolution of spoken language: a comparative approach. 1-8 - Steve J. Young:
Talking to machines (statistically speaking). 9-16
Speech Recognition in Noise - I
- Duncan Macho, Laurent Mauuary, Bernhard Noé, Yan Ming Cheng, Douglas Ealey, Denis Jouvet, Holly Kelleher, David Pearce, Fabien Saadoun:
Evaluation of a noise-robust DSR front-end on Aurora databases. 17-20 - André Gustavo Adami, Lukás Burget, Stéphane Dupont, Harinath Garudadri, Frantisek Grézl, Hynek Hermansky, Pratibha Jain, Sachin S. Kajarekar, Nelson Morgan, Sunil Sivadas:
Qualcomm-ICSI-OGI features for ASR. 21-24 - Michael Kleinschmidt, David Gelbart:
Improving word accuracy with Gabor feature extraction. 25-28 - Jasha Droppo, Li Deng, Alex Acero:
Evaluation of SPLICE on the Aurora 2 and 3 tasks. 29-32 - Brian Kan-Wing Mak, Yik-Cheung Tam:
Performance of discriminatively trained auditory features on Aurora2 and Aurora3. 33-36 - José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio:
Feature extraction combining spectral noise reduction and cepstral histogram equalization for robust ASR. 225-228 - Jingdong Chen, Dimitris Dimitriadis, Hui Jiang, Qi Li, Tor André Myrvoll, Olivier Siohan, Frank K. Soong:
Bell labs approach to Aurora evaluation on connected digit recognition. 229-232 - Hong Kook Kim, Richard C. Rose:
Algorithms for distributed speech recognition in a noisy automobile environment. 233-236 - Florian Hilger, Sirko Molau, Hermann Ney:
Quantile based histogram equalization for online applications. 237-240 - Chia-Ping Chen, Karim Filali, Jeff A. Bilmes:
Frontend post-processing and backend model enhancement on the Aurora 2.0/3.0 databases. 241-244 - Masaki Ida, Satoshi Nakamura:
HMM COmposition-based rapid model adaptation using a priori noise GMM adaptation evaluation on Aurora2 corpus. 437-440 - Jeih-Weih Hung, Lin-Shan Lee:
Data-driven temporal filters obtained via different optimization criteria evaluated on Aurora2 database. 441-444 - Bojan Kotnik, Damjan Vlaj, Zdravko Kacic, Bogomir Horvat:
Efficient additive and convolutional noise reduction procedures. 445-448 - Markus Lieb, Alexander Fischer:
Progress with the philips continuous ASR system on the Aurora 2 noisy digits database. 449-452 - Jian Wu, Qiang Huo:
An environment compensated minimum classification error training approach and its evaluation on Aurora2 database. 453-456 - Kaisheng Yao, Donglai Zhu, Satoshi Nakamura:
Evaluation of a noise adaptive speech recognition system on the Aurora 3 database. 457-460 - Laura Docío Fernández, Carmen García-Mateo:
Distributed speech recognition over IP networks on the Aurora 3 database. 461-464 - Masakiyo Fujimoto, Yasuo Ariki:
Evaluation of noisy speech recognition based on noise reduction and acoustic model adaptation on the Aurora2 tasks. 465-468 - George Saon, Juan M. Huerta:
Improvements to the IBM Aurora 2 multi-condition system. 469-472 - Pratibha Jain, Hynek Hermansky, Brian Kingsbury:
Distributed speech recognition using noise-robust MFCC and traps-estimated manner features. 473-476 - Norihide Kitaoka, Seiichi Nakagawa:
Evaluation of spectral subtraction with smoothing of time direction on the Aurora 2 task. 477-480 - Xiaodong Cui, Markus Iseli, Qifeng Zhu, Abeer Alwan:
Evaluation of noise robust features on the Aurora databases. 481-484 - Nicholas W. D. Evans, John S. D. Mason:
Computationally efficient noise compensation for robust automatic speech recognition assessed under the Aurora 2/3 framework. 485-488 - Omar Farooq, Sekharjit Datta:
Mel-scaled wavelet filter based features for noisy unvoiced phoneme recognition. 1017-1020 - Kazuo Onoe, Hiroyuki Segi, Takeshi Kobayakawa, Shoei Sato, Toru Imai, Akio Ando:
Filter bank subtraction for robust speech recognition. 1021-1024 - Andrew C. Morris, Simon Payne, Hervé Bourlard:
Low cost duration modelling for noise robust speech recognition. 1025-1028 - Yifan Gong:
A comparative study of approximations for parallel model combination of static and dynamic parameters. 1029-1032 - Petr Motlícek, Lukás Burget:
Noise estimation for efficient speech enhancement and robust speech recognition. 1033-1036 - Özgür Çetin, Harriet J. Nock, Katrin Kirchhoff, Jeff A. Bilmes, Mari Ostendorf:
The 2001 GMTK-based SPINE ASR system. 1037-1040 - Wei-Wen Hung:
Using adaptive signal limiter together with weighting techniques for noisy speech recognition. 1041-1044 - Shingo Yamade, Kanako Matsunami, Akira Baba, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Spectral subtraction in noisy environments applied to speaker adaptation based on HMM sufficient statistics. 1045-1048 - Man-Hung Siu, Yu-Chung Chan:
Robust speech recognition against short-time noise. 1049-1052 - Mario Toma, Andrea Lodi, Roberto Guerrieri:
Word endpoints detection in the presence of non-stationary noise. 1053-1056 - Pere Pujol Marsal, Susagna Pol, Astrid Hagen, Hervé Bourlard, Climent Nadeu:
Comparison and combination of RASTA-PLP and FF features in a hybrid HMM/MLP speech recognition system. 1057-1060 - Tao Xu, Zhigang Cao:
Robust MMSE-FW-LAASR scheme at low SNRs. 1061-1064 - András Zolnay, Ralf Schlüter, Hermann Ney:
Robust speech recognition using a voiced-unvoiced feature. 1065-1068 - Febe de Wet, Johan de Veth, Bert Cranen, Lou Boves:
Accumulated kullback divergence for analysis of ASR performance in the presence of noise. 1069-1072 - Brian Kingsbury, Pratibha Jain, André Gustavo Adami:
A hybrid HMM/traps model for robust voice activity detection. 1073-1076 - Chengyi Zheng, Yonghong Yan:
Run time information fusion in speech recognition. 1077-1080 - Jon A. Arrowood, Mark A. Clements:
Using observation uncertainty in HMM decoding. 1561-1564 - Matthew N. Stuttle, Mark J. F. Gales:
Combining a Gaussian mixture model front end with MFCC parameters. 1565-1568 - Jasha Droppo, Alex Acero, Li Deng:
Noise from corrupted speech log mel-spectral energies. 1569-1572 - Carlos S. Lima, Luís B. Almeida, João L. Monteiro:
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition. 1573-1576 - Venkata Ramana Rao Gadde, Andreas Stolcke, Dimitra Vergyri, Jing Zheng, M. Kemal Sönmez, Anand Venkataraman:
Building an ASR system for noisy environments: SRI's 2001 SPINE evaluation system. 1577-1580
Experimental Phonetics
- R. J. J. H. van Son, Louis C. W. Pols:
Evidence for efficiency in vowel production. 37-40 - Matthew P. Aylett:
Stochastic suprasegmentals: relationship between the spectral characteristics of vowels, redundancy and prosodic structure. 41-44 - Jihène Serkhane, Jean-Luc Schwartz, Louis-Jean Boë, Barbara L. Davis, Christine L. Matyear:
Motor specifications of a baby robot via the analysis of infants² vocalizations. 45-48 - Laura L. Koenig, Jorge C. Lucero:
Oral-laryngeal control patterns for fricatives in 5-year-olds and adults. 49-52 - Véronique Delvaux, Thierry Metens, Alain Soquet:
French nasal vowels: acoustic and articulatory properties. 53-56
Speech Recognition: Adaptation
- Patrick Kenny, Gilles Boulianne, Pierre Dumouchel:
Maximum likelihood estimation of eigenvoices and residual variances for large vocabulary speech recognition tasks. 57-60 - Ernest Pusateri, Timothy J. Hazen:
Rapid speaker adaptation using speaker clustering. 61-64 - Chao Huang, Tao Chen, Eric Chang:
Adaptive model combination for dynamic speaker selection training. 65-68 - Ka-Yan Kwan, Tan Lee, Chen Yang:
Unsupervised n-best based model adaptation using model-level confidence measures. 69-72 - Patrick Nguyen, Luca Rigazio, Christian Wellekens, Jean-Claude Junqua:
LU factorization for feature transformation. 73-76 - Guo-Hong Ding, Yi-Fei Zhu, Chengrong Li, Bo Xu:
Implementing vocal tract length normalization in the MLLR framework. 1389-1392 - Dong Kook Kim, Nam Soo Kim:
Markov models based on speaker space model evolution. 1393-1396 - Baojie Li, Keikichi Hirose, Nobuaki Minematsu:
Robust speech recognition using inter-speaker and intra-speaker adaptation. 1397-1400 - Carlos S. Lima, Luís B. Almeida, João L. Monteiro:
Continuous environmental adaptation of a speech recogniser in telephone line conditions. 1401-1404 - Irina Illina:
Tree-structured maximum a posteriori adaptation for a segment-based speech recognition system. 1405-1408 - Thomas Plötz, Gernot A. Fink:
Robust time-synchronous environmental adaptation for continuous speech recognition systems. 1409-1412 - Thomas Niesler, Daniel Willett:
Unsupervised language model adaptation for lecture speech transcription. 1413-1416 - Yongxin Li, Hakan Erdogan, Yuqing Gao, Etienne Marcheret:
Incremental on-line feature space MLLR adaptation for telephony speech recognition. 1417-1420 - Sirko Molau, Florian Hilger, Daniel Keysers, Hermann Ney:
Enhanced histogram normalization in the acoustic feature space. 1421-1424 - David N. Levin:
Blind normalization of speech from different channels and speakers. 1425-1428 - Jun Ogata, Yasuo Ariki:
Unsupervised acoustic model adaptation based on phoneme error minimization. 1429-1432 - Bowen Zhou, John H. L. Hansen:
Improved structural maximum likelihood eigenspace mapping for rapid speaker adaptation. 1433-1436 - Ángel de la Torre, Dominique Fohr, Jean Paul Haton:
Statistical adaptation of acoustic models to noise conditions for robust speech recognition. 1437-1440 - Fabio Brugnara, Mauro Cettolo, Marcello Federico, Diego Giuliani:
Issues in automatic transcription of historical audio data. 1441-1444
Language Identification
- Verna Stockmal, Zinny S. Bond:
Same talker, different language: a replication. 77-80 - A. K. V. Sai Jayram, V. Ramasubramanian, T. V. Sreenivas:
Automatic language identification using acoustic sub-word units. 81-84 - Ian Maddieson, Ioana Vasilescu:
Factors in human language identification. 85-88 - Pedro A. Torres-Carrasquillo, Elliot Singer, Mary A. Kohler, Richard J. Greene, Douglas A. Reynolds, John R. Deller Jr.:
Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. 89-92 - Eddie Wong, Sridha Sridharan:
Methods to improve Gaussian mixture model based language identification system. 93-96
Speech Synthesis
- Hongyan Jing, Evelyne Tzoukermann:
Part-of-speech tagging in French text-to-speech synthesis: experiments in tagset selection. 97-100 - Ulla Uebler:
Grapheme-to-phoneme conversion using pseudo-morphological units. 101-104 - Maximilian Bisani, Hermann Ney:
Investigations on joint-multigram models for grapheme-to-phoneme conversion. 105-108 - Lucian Galescu, James F. Allen:
Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion. 109-112 - Matthias Jilka, Ann K. Syrdal:
The AT&t German text-to-speech system: realistic linguistic description. 113-116 - Haiping Li, Fangxin Chen, Liqin Shen:
Generating script using statistical information of the context variation unit vector. 117-120 - Chih-Chung Kuo, Jing-Yi Huang:
Efficient and scalable methods for text script generation in corpus-based TTS design. 121-124 - Peter Rutten, Matthew P. Aylett, Justin Fackrell, Paul Taylor:
A statistically motivated database pruning technique for unit selection synthesis. 125-128 - Yi-Jian Wu, Yu Hu, Xiaoru Wu, Ren-Hua Wang:
A new method of building decision tree based on target information. 129-132 - Junichi Yamagishi, Masatsune Tamura, Takashi Masuko, Keiichi Tokuda, Takao Kobayashi:
A context clustering technique for average voice model in HMM-based speech synthesis. 133-136 - Minoru Tsuzaki, Hisashi Kawai:
Feature extraction for unit selection in concatenative speech synthesis: comparison between AIM, LPC, and MFCC. 137-140 - Francisco Campillo Díaz, Eduardo Rodríguez Banga:
Combined prosody and candidate unit selections for corpus-based text-to-speech systems. 141-144 - Yeon-Jun Kim, Alistair Conkie:
Automatic segmentation combining an HMM-based approach and spectral boundary correction. 145-148 - Abhinav Sethy, Shrikanth S. Narayanan:
Refined speech segmentation for concatenative speech synthesis. 149-152 - Andrew P. Breen, Barry Eggleton, Peter Dion, Steve Minnis:
Refocussing on the text normalisation process in text-to-speech systems. 153-156 - Jithendra Vepa, Jahnavi Ayachitam, K. V. K. Kalpana Reddy:
A text-to-speech synthesis system for telugu. 157-160 - Diamantino Freitas, Daniela Braga:
Towards an intonation module for a portuguese TTS system. 161-164 - Takashi Saito, Masaharu Sakamoto:
Applying a hybrid intonation model to a seamless speech synthesizer. 165-168 - Toshio Hirai, Seiichi Tenpaku, Kiyohiro Shikano:
Using start/end timings of spectral transitions between phonemes in concatenative speech synthesis. 2357-2360 - Jinfu Ni, Hisashi Kawai:
Design of a Mandarin sentence set for corpus-based speech synthesis by use of a multi-tier algorithm taking account of the varied prosodic and spectral characteristics. 2361-2364 - Hiroki Mori, Takahiro Ohtsuka, Hideki Kasuya:
A data-driven approach to source-formant type text-to-speech system. 2365-2368 - Yu Shi, Eric Chang, Hu Peng, Min Chu:
Power spectral density based channel equalization of large speech database for concatenative TTS system. 2369-2372 - Helen M. Meng, Chi-Kin Keung, Kai-Chung Siu, Tien Ying Fung, P. C. Ching:
CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects. 2373-2376 - Jinlin Lu, Hisashi Kawai:
Perceptual evaluation of naturalness due to substitution of Chinese syllable for concatenative speech synthesis. 2377-2380 - Dan Chazan, Ron Hoory, Zvi Kons, Dorel Silberstein, Alexander Sorin:
Reducing the footprint of the IBM trainable speech synthesis system. 2381-2384 - Sung-Joo Lee, Hyung Soon Kim:
Computationally efficient time-scale modification of speech using 3 level clipping. 2385-2388 - Zhiwei Shuang, Yu Hu, Zhen-Hua Ling, Ren-Hua Wang:
A miniature Chinese TTS system based on tailored corpus. 2389-2392 - Hoeun Song, Jaein Kim, Kyongrok Lee, Jinyoung Kim:
Phonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system. 2393-2396 - Hideki Kawahara, Parham Zolfaghari, Alain de Cheveigné:
On F0 trajectory optimization for very high-quality speech manipulation. 2397-2400 - Tan Lee, Greg Kochanski, Chilin Shih, Yujia Li:
Modeling tones in continuous Cantonese speech. 2401-2404 - Minghui Dong, Kim-Teng Lua:
Pitch contour model for Chinese text-to-speech using CART and statistical model. 2405-2408 - Phuay Hui Low, Saeed Vaseghi:
Application of microprosody models in text to speech synthesis. 2413-2416 - Sheng Zhao, Jianhua Tao, Lianhong Cai:
Prosodic phrasing with inductive learning. 2417-2420 - Ben Milner, Xu Shao:
Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model. 2421-2424 - Hiromichi Kawanami, Tsuyoshi Masuda, Tomoki Toda, Kiyohiro Shikano:
Designing Japanese speech database covering wide range in prosody for hybrid speech synthesizer. 2425-2428
Multimodal Spoken Language Processing
- Dirk Bühler, Wolfgang Minker, Jochen Häußler, Sven Krger:
Flexible multimodal human-machine interaction in mobile environments. 169-172 - Edward C. Kaiser, Philip R. Cohen:
Implementation testing of a hybrid symbolic/statistical multimodal architecture. 173-176 - Yoko Yamakata, Tatsuya Kawahara, Hiroshi G. Okuno:
Belief network based disambiguation of object reference in spoken dialogue system for robot. 177-180 - Jonas Beskow, Jens Edlund, Magnus Nordstrand:
Specification and realisation of multimodal output in dialogue systems. 181-184 - Francis K. H. Quek, Yingen Xiong, David McNeill:
Gestural trajectory symmetries and discourse segmentation. 185-188 - Francis K. H. Quek, David McNeill, Robert K. Bryll, Mary P. Harper:
Gestural spatialization in natural discourse segmentation. 189-192 - Kazuhiro Nakadai, Hiroshi G. Okuno, Hiroaki Kitano:
Real-time sound source localization and separation for robot audition. 193-196 - Jiyong Ma, Jie Yan, Ronald A. Cole:
CU animate tools for enabling conversations with animated characters. 197-200 - Philip R. Cohen, Rachel Coulston, Kelly Krout:
Multiparty multimodal interaction: a preliminary analysis. 201-204 - Peter Poller, Jochen Müller:
Distributed audio-visual speech synchronization. 205-208 - Philippe Daubias, Paul Deléglise:
Lip-reading based on a fully automatic statistical model. 209-212 - Xiaoxing Liu, Yibao Zhao, Xiaobo Pi, Luhong Liang, Ara V. Nefian:
Audio-visual continuous speech recognition using a coupled hidden Markov model. 213-216 - Laila Dybkjær, Niels Ole Bernsen:
Data, annotation schemes and coding tools for natural interactivity. 217-220 - Francis K. H. Quek, Yang Shi, Cemil Kirbas, Shunguang Wu:
VisSTA: a tool for analyzing multimodal discourse data. 221-224
Perception: Non-Native
- Stephen G. Lambacher, William L. Martens, Kazuhiko Kakehi:
The influence of identification training on identification and production of the american English mid and low vowels by native speakers of Japanese. 245-248 - Keiichi Tajima, Reiko Akahane-Yamada, Tsuneo Yamada:
Perceptual learning of second-language syllable rhythm by elderly listeners. 249-252 - Constance M. Clarke:
Perceptual adjustment to foreign-accented English with short term exposure. 253-256 - Denis K. Burnham, Ron Brooker:
Absolute pitch and lexical tones: tone perception by non-musician, musician, and absolute pitch non-tonal language speakers. 257-260 - Mirjam Broersma:
Comprehension of non-native speech: inaccurate phoneme processing and activation of lexical competitors. 261-264
Dialog Systems I: Evaluation
- Wolfgang Minker:
Overview on recent activities in speech understanding and dialogue systems evaluation. 265-268 - Marilyn A. Walker, Alexander I. Rudnicky, Rashmi Prasad, John S. Aberdeen, Elizabeth Owen Bratt, John S. Garofolo, Helen Wright Hastie, Audrey N. Le, Bryan L. Pellom, Alexandros Potamianos, Rebecca J. Passonneau, Salim Roukos, Gregory A. Sanders, Stephanie Seneff, David Stallard:
DARPA communicator: cross-system results for the 2001 evaluation. 269-272 - Marilyn A. Walker, Alexander I. Rudnicky, John S. Aberdeen, Elizabeth Owen Bratt, John S. Garofolo, Helen Wright Hastie, Audrey N. Le, Bryan L. Pellom, Alexandros Potamianos, Rebecca J. Passonneau, Rashmi Prasad, Salim Roukos, Gregory A. Sanders, Stephanie Seneff, David Stallard:
DARPA communicator evaluation: progress from 2000 to 2001. 273-276 - Gregory A. Sanders, Audrey N. Le, John S. Garofolo:
Effects of word error rate in the DARPA communicator data during 2000 and 2001. 277-280 - Candace L. Sidner, Clifton Forlines:
Subset languages for conversing with collaborative interface agents. 281-284
Voice Conversion
- Tomomi Watanabe, Takahiro Murakami, Munehiro Namba, Tetsuya Hoya, Yoshihisa Ishida:
Transformation of spectral envelope for voice conversion based on radial basis function networks. 285-288 - Oytun Türk, Levent M. Arslan:
Subband based voice conversion. 289-292 - Mikiko Mashimo, Tomoki Toda, Hiromichi Kawanami, Hideki Kashioka, Kiyohiro Shikano, Nick Campbell:
Evaluation of cross-language voice conversion using bilingual and non-bilingual databases. 293-296 - Joakim Gustafson, Kåre Sjölander:
Voice transformations for improving children²s speech recognition in a publicly available dialogue system. 297-300
Spoken Language Resources
- Susanne Burger, Victoria MacLaren, Hua Yu:
The ISL meeting corpus: the impact of meeting type on speech style. 301-304 - Ramón López-Cózar, Ángel de la Torre, José C. Segura, Antonio J. Rubio, Juan M. López-Soler:
A new method for testing dialogue systems based on simulations of real-world conditions. 305-308 - Thorsten Ludwig:
Comfort noise detection and GSM-FR-codec detection for speech-quality evaluations in telephone networks. 309-312 - Catia Cucchiarini, Diana Binnenpoorte:
Validation and improvement of automatic phonetic transcriptions. 313-316 - Shigeaki Amano, Kazumi Kato, Tadahisa Kondo:
Development of Japanese infant speech database and speaking rate analysis. 317-320 - Minghui Dong, Kim-Teng Lua:
Automatic prosodic break labeling for Mandarin Chinese speech data. 321-324 - Imed Zitouni, Joseph P. Olive, Dorota J. Iskra, Khalid Choukri, Ossama Emam, Oren Gedge, Emmanuel Maragoudakis, Herbert S. Tropf, Asunción Moreno, Albino Nogueiras Rodríguez, Barbara Heuft, Rainer Siemund:
Orientel: speech-based interactive communication applications for the mediterranean and the middle east. 325-328 - Yolanda Vazquez-Alvarez, Mark A. Huckvale:
The reliability of the ITU-t p.85 standard for the evaluation of text-to-speech systems. 329-332 - Kris Demuynck, Tom Laureys, Steven Gillis:
Automatic generation of phonetic transcriptions for large speech corpora. 333-336 - Wolfgang Minker:
Overview on recent activities in speech understanding and dialogue systems evaluation. 337-340 - Christina L. Bennett, Alexander I. Rudnicky:
The carnegie mellon communicator corpus. 341-344 - Tanja Schultz:
Globalphone: a multilingual speech and text database developed at karlsruhe university. 345-348 - Özgül Salor, Bryan L. Pellom, Tolga Çiloglu, Kadri Hacioglu, Mübeccel Demirekler:
On developing new text and audio corpora and speech recognition tools for the turkish language. 349-352 - Craig Martell:
FORM: an extensible, kinematically-based gesture annotation scheme. 353-356 - John-Paul Hosom:
Automatic phoneme alignment based on acoustic-phonetic modeling. 357-360 - Narendra K. Gupta, Srinivas Bangalore, Mazin G. Rahim:
Extracting clauses for spoken language understanding in conversational systems. 361-364 - Fabrice Lefèvre, Hélène Bonneau-Maynard:
Issues in the development of a stochastic speech understanding system. 365-368 - Hartmut R. Pfitzinger:
10 years of phondat-II: a reassessment. 369-372
Speech Recognition: Search
- Shankar Kumar, William Byrne:
Risk based lattice cutting for segmental minimum Bayes-risk decoding. 373-376 - Sascha Wendt, Gernot A. Fink, Franz Kummert:
Dynamic search-space pruning for time-constrained speech recognition. 377-380 - Raymond H. Lee, Eric H. C. Choi:
A Gaussian selection method for multi-mixture HMM based continuous speech recognition. 381-384 - Rong Dong, Jie Zhu:
On use of duration modeling for continuous digits speech recognition. 385-388 - Geoffrey Zweig, George Saon, François Yvon:
Arc minimization in finite state decoding graphs with cross-word acoustic context. 389-392 - Jing Zheng, Horacio Franco:
Fast hierarchical grammar optimization algorithm toward time and space efficiency. 393-396 - Sherif M. Abdou, Michael S. Scordilis:
Dynamic tuning of language model score in speech recognition using a confidence measure. 397-400 - Xiao Zhang, Yunxin Zhao:
Minimum perfect hashing for fast n-gram language model lookup. 401-404 - Xiang Li, Rita Singh, Richard M. Stern:
Combining search spaces of heterogeneous recognizers for improved speech recogniton. 405-408
Auditory Models and Hearing Aids
- Karel Pellant, Jan Mejzlík, Karel Prikryl, Zdenek Skvor:
Transmission characteristics of outer ear canal. 409-412 - James M. Kates:
Hearing-aid benefits and limitations: predictions from a cochlear model. 413-416 - Peggy B. Nelson, Jeffrey J. DiGiovanni, Robert S. Schlauch:
A psychoacoustic basis for spectral sharpening. 417-420 - Lisa G. Huettel, Leslie M. Collins:
Model-based predictions of intensity discrimination for normal- and impaired-hearing listeners. 421-424 - Peter F. Assmann, Terrance M. Nearey, Jack M. Scott:
Modeling the perception of frequency-shifted vowels. 425-428 - Carol L. Mackersie:
The relationship between pure-tone sequential stream segregation and perceptual separation of male and female talkers by listeners with hearing loss. 429-432 - Mathias Johansson, Mats Blomberg, Kjell Elenius, Lars-Erik Hoffsten, Anders Torberger:
A phoneme recognizer for the hearing impaired. 433-436
Multi-Lingual and Non-Native Spoken Language Processing
- Volker Fischer, Eric Janke, Siegfried Kunzmann:
Likelihood combination and recognition output voting for the decoding of non-native speech with multilingual HMMs. 489-492 - Pongtep Angkititrakul, John H. L. Hansen:
Stochastic trajectory model analysis for accent classification. 493-496 - Jilei Tian, Juha Häkkinen, Olli Viikki:
Multilingual pronunciation modeling for improving multilingual speech recognition. 497-500 - Jilei Tian, Juha Häkkinen, Søren Riis, Kåre Jean Jensen:
On text-based language identification for multilingual speech recognition systems. 501-504 - Bin Ma, Cuntai Guan, Haizhou Li, Chin-Hui Lee:
Multilingual speech recognition with language identification. 505-508 - Rathi Chengalvarayan:
Robust HMM training for unified dutch and German speech recognition. 509-512 - Sanjeev Khudanpur, Woosung Kim:
Using cross-language cues for story-specific language modeling. 513-516 - Bing Zhao, Stephan Vogel:
Full-text story alignment models for Chinese-English bilingual news corpora. 517-520 - Jayren J. Sooful, Elizabeth C. Botha:
Comparison of acoustic distance measures for automatic cross-language phoneme mapping. 521-524 - Xiaodong He, Yunxin Zhao:
Maximum expected likelihood based model selection and adaptation for nonnative English speakers. 525-528 - Nobuaki Minematsu, Gakuto Kurata, Keikichi Hirose:
Integration of MLLR adaptation with pronunciation proficiency adaptation for non-native speech recognition. 529-532 - Thu Nguyen, John Ingram:
Native and vietnamese production of compound and phrasal stress patterns. 533-536
Prosody in Spoken Dialogue Systems
- Johanneke Caspers:
On the function of the late rise and the early fall in dutch dialogue: a perception experiment. 537-540 - Anna Esposito, Susan Duncan, Francis K. H. Quek:
Holds as gestural correlates to empty and filled speech pauses. 541-544 - Toshihiko Itoh, Atsuhiko Kai, Tatsuhiro Konishi, Yukihiro Itoh:
Linguistic and acoustic changes of user²s utterances caused by different dialogue situations. 545-548 - Nigel Ward, Satoshi Nakagawa:
Automatic user-adaptive speaking rate selection for information delivery. 549-552 - Gabriel Skantze:
Coordination of referring expressions in multimodal human-computer dialogue. 553-556 - Loredana Cerrato:
A comparison between feedback strategies in human-to-human and human-machine communication. 557-560 - Courtney Darves, Sharon L. Oviatt:
Adaptation of users² spoken dialogue patterns in a conversational interface. 561-564
Speaker Segmentation and Adaptation
- Aaron E. Rosenberg, Allen L. Gorin, Zhu Liu, Sarangarajan Parthasarathy:
Unsupervised speaker segmentation of telephone conversations. 565-568 - P. Sivakumaran, Aladdin M. Ariyaeeinia, J. Fortuna:
An effective unsupervised scheme for multiple-speaker-change detection. 569-572 - Jitendra Ajmera, Hervé Bourlard, I. Lapidot, Iain McCowan:
Unknown-multiple speaker clustering using HMM. 573-576 - Sylvain Meignier, Jean-François Bonastre, Ivan Magrin-Chagnolleau:
Speaker utterances tying among speaker segmented audio documents using hierarchical classification: towards speaker indexing of audio databases. 577-580 - Johnny Mariéthoz, Samy Bengio:
A comparative study of adaptation methods for speaker verification. 581-584 - Kevin R. Farrell:
Speaker verification with data fusion and model adaptation. 585-588 - Nikki Mirghafori, Larry P. Heck:
An adaptive speaker verification system with speaker dependent a priori decision thresholds. 589-592
Spoken Language Understanding
- Deb Roy, Peter Gorniak, Niloy Mukherjee, Joshua Juster:
A trainable spoken language understanding system for visual object selection. 593-596 - Frédéric Béchet, Allen L. Gorin, Jerry H. Wright, Dilek Hakkani-Tür:
Named entity extraction from spontaneous speech in how may i help you? 597-600 - Caroline Bousquet-Vernhettes, Nadine Vigouroux:
Recognition error processing for speech understanding. 601-604 - Andrew N. Pargellis, Eric Fosler-Lussier, Augustine Tsai:
Using part-of-speech tags, context thresholding, and trigram contexts to improve the auto-induction of semantic classes. 605-608 - Ye-Yi Wang, Alex Acero, Ciprian Chelba, Brendan J. Frey, Leon Wong:
Combination of statistical and rule-based approaches for spoken language understanding. 609-612 - Guodong Xie, Chengqing Zong, Bo Xu:
Chinese spoken language analyzing based on combination of statistical and rule methods. 613-616 - Norbert Pfannerer:
A maximum entropy semantic parser using word classes. 617-620
INTERSPEECH
- Aparna Gurijala, John R. Deller Jr., Michael S. Seadle, John H. L. Hansen:
Speech watermarking through parametric modeling. 621-624 - Hong Kai Sze, Sh-Hussain Salleh:
An education software in teaching automatic speech recognition (ASR). 625-628 - Benfang Xiao, Cynthia Girand, Sharon L. Oviatt:
Multimodal integration patterns in children. 629-632 - Odette Scharenborg, Lou Boves, Johan de Veth:
ASR in a human word recognition model: generating phonemic input for shortlist. 633-636 - Chung-Hsien Wu, Yu-Hsien Chiu, Kung-Wei Cheng:
Sign language translation using an error tolerant retrieval algorithm. 637-640 - Oytun Türk, Ömer Sayli, Helin Dutagaci, Levent M. Arslan:
A sound source classification system based on subband processing. 641-644 - Ying Zhang, Bing Zhao, Jie Yang, Alex Waibel:
Automatic sign translation. 645-648 - Stanley J. Wenndt, Edward J. Cupples, Richard M. Floyd:
A study on the classification of whispered and normally phonated speech. 649-652 - Kiyoshi Tatara, Taisuke Ito, Parham Zolfaghari, Kazuya Takeda, Fumitada Itakura:
Experiments on recognition of lavalier microphone speech and whispered speech in real world environments. 653-656 - Mamoru Iwaki, Hiromi Seki:
An effect of amplitude modulation on perceptual segregation of tone sequences. 657-660 - Eric Sanders, Marina B. Ruiter, Lilian Beijer, Helmer Strik:
Automatic recognition of dutch dysarthric speech: a pilot study. 661-664 - Olov Engwall:
Evaluation of a system for concatenative articulatory visual speech synthesis. 665-668 - Marc Sato, Jean-Luc Schwartz, Marie-Agnès Cathiard, Christian Abry, Hélène Loevenbruck:
Intrasyllabic articulatory control constraints in verbal working memory. 669-672 - Nick Campbell:
Towards a grammar of spoken language: incorporating paralinguistic information. 673-676 - Qun Li, Martin J. Russell:
An analysis of the causes of increased error rates in children²s speech recognition. 2337-2340 - Anne-Marie Öster:
A new computer-based analytical speech perception test for prelingually deaf children and children with speech disorders. 2341-2344 - Harriet J. Fell, Joel MacAuslan, Linda J. Ferrier, Susan G. Worst, Karen Chenausky:
Vocalization age as a clinical tool. 2345-2348 - Piero Cosi, Michael M. Cohen, Dominic W. Massaro:
Baldini: baldi speaks italian! 2349-2352 - Christian Cavé, Isabelle Guaïtella, Serge Santi:
Eyebrow movements and voice variations in dialogue situations: an experimental investigation. 2353-2356
Large Vocabulary Speech Recognition
- Ricardo de Córdoba, Javier Macías Guarasa, Javier Ferreiros, Juan Manuel Montero, José Manuel Pardo:
State clustering improvements for continuous HMMs in a Spanish large vocabulary recognition system. 677-680 - Tomaz Rotovnik, Mirjam Sepesy Maucec, Bogomir Horvat, Zdravko Kacic:
A comparison of HTK, ISIP and julius in slovenian large vocabulary continuous speech recognition. 681-684 - Lei Jia, Bo Xu:
Parametric trajectory segment model for LVCSR. 685-688 - Javier Dieguez-Tirado, Antonio Cardenal López:
Efficient precalculation of LM contexts for large vocabulary continuous speech recognition. 689-692 - Rathi Chengalvarayan:
Integrating multiple pronunciations during MCE-based acoustic model training for large vocabulary speech recognition. 693-696 - Tom Laureys, Vincent Vandeghinste, Jacques Duchateau:
A hybrid approach to compounds in LVCSR. 697-700 - Takehito Utsuro, Tetsuji Harada, Hiromitsu Nishizaki, Seiichi Nakagawa:
A confidence measure based on agreement among multiple LVCSR models - correlation between pair of acoustic models and confidence. 701-704 - Jan Nouza, Jindra Drabkova:
Combining lexical and morphological knowledge in language model for inflectional (czech) language. 705-708 - Long Nguyen, Xuefeng Guo, John Makhoul:
Modeling frequent allophones in Japanese speech recognition. 709-712 - Feili Chen, Jie Zhu, Wentao Song:
The structure and its implementation of hidden dynamic HMM for Mandarin speech recognition. 713-716 - Takahiro Shinozaki, Sadaoki Furui:
A new lexicon optimization method for LVCSR based on linguistic and acoustic characteristics of words. 717-720 - David Langlois, Kamel Smaïli, Jean Paul Haton:
Retrieving phrases by selecting the history: application to automatic speech recognition. 721-724 - Dong-Hoon Ahn, Minhwa Chung:
Compact subnetwork-based large vocabulary continuous speech recognition. 725-728 - Helin Dutagaci, Levent M. Arslan:
A comparison of four language models for large vocabulary turkish speech recognition. 729-732
Integration of Speech Technology in Language Learning
- Rebecca Hincks:
Speech recognition for language teaching and evaluating: a study of existing commercial products. 733-736 - Antoine Raux, Tatsuya Kawahara:
Automatic intelligibility assessment and diagnosis of critical pronunciation errors for computer-assisted pronunciation learning. 737-740 - Yukari Hirata:
Effects of production training with visual feedback on the acquisition of Japanese pitch and durational contrasts. 741-744 - Nobuaki Minematsu, Satoshi Kobashikawa, Keikichi Hirose, Donna Erickson:
Acoustic modeling of sentence stress using differential features between syllables for English rhythm learning system development. 745-748 - Kazunori Imoto, Yasushi Tsubota, Antoine Raux, Tatsuya Kawahara, Masatake Dantsuji:
Modeling and automatic detection of English sentence stress for computer-assisted English prosody learning system. 749-752 - Yasushi Tsubota, Tatsuya Kawahara, Masatake Dantsuji:
Recognition and verification of English by Japanese students for computer-assisted language learning system. 1205-1208 - Ambra Neri, Catia Cucchiarini, Helmer Strik:
Feedback in computer assisted pronunciation training: technology push or demand pull? 1209-1212 - Nobuaki Minematsu, Gakuto Kurata, Keikichi Hirose:
Corpus-based analysis of English spoken by Japanese students in view of the entire phonemic system of English. 1213-1216 - Debra M. Hardison:
Computer-assisted second-language speech learning: generalization of prosody-focused training. 1217-1220 - Jack Mostow, Joseph Beck, S. Vanessa Winter, Shaojun Wang, Brian Tobin:
Predicting oral reading miscues. 1221-1224 - Chanwoo Kim, Wonyong Sung:
Implementation of an intonational quality assessment system. 1225-1228 - Yasuo Ariki, Jun Ogata:
English call system with functions of speech segmentation and pronunciation evaluation using speech recognition technology. 1229-1232
Perception of Prosody
- Hansjörg Mixdorff, Sudaporn Luksaneeyanawin, Hiroya Fujisaki, Patavee Charnvivit:
Perception of tone and vowel quantity in Thai. 753-756 - Keisuke Kinoshita, Dawn M. Behne, Takayuki Arai:
Duration and F0 as perceptual cues to Japanese vowel quantity. 757-760 - Makiko Muto, Hiroaki Kato, Minoru Tsuzaki, Yoshinori Sagisaka:
Effects of intra-phrase position on acceptability of changes in segmental duration in sentence speech. 761-764 - Dragana Barac-Cikoja, Sally Revoile:
Perception of prosodic phrasing by hearing-impaired listeners. 765-768 - Wendi A. Aasland, Shari R. Baum:
Processing of temporal cues marking phrasal boundaries in individuals with brain damage. 769-772
Speech Enhancement I
- Wolfgang Herbordt, J. Ying, Herbert Buchner, Walter Kellermann:
A real-time acoustic human-machine front-end for multimedia applications integrating robust adaptive beamforming and stereophonic acoustic echo cancellation. 773-776 - Ching-Ta Lu, Hsiao-Chuan Wang:
Enhancement of single channel speech using perception-based wavelet transform. 777-780 - Lee Lin, W. Harvey Holmes, Eliathamby Ambikairajah:
Speech enhancement based on a perceptual modification of wiener filtering. 781-784 - Hagai Attias, Li Deng:
A new approach to speech enhancement by a microphone array using EM and mixture models. 785-788 - Sang-Gyun Kim, Chang D. Yoo:
Acoustic echo cancellation based on m-channel IIR cosine-modulated filter bank. 789-792 - Hiroshi Saruwatari, Katsuyuki Sawai, Akinobu Lee, Kiyohiro Shikano, Atsunobu Kaminuma, Masao Sakata:
Speech enhancement in car environment using blind source separation. 1781-1784 - Ilyas Potamitis, Nikos Fakotakis, George Kokkinakis:
Speech enhancement based on combining perceptual enhancement and short-time spectral attenuation. 1785-1788 - Takanobu Nishiura, Satoshi Nakamura, Yuka Okada, Takeshi Yamada, Kiyohiro Shikano:
Suitable design of adaptive beamformer based on average speech spectrum for noisy speech recognition. 1789-1792 - King Tam, Hamid Sheikhzadeh, Todd Schneider:
Highly oversampled subband adaptive filters for noise cancellation on a low-resource DSP system. 1793-1796 - Yi Hu, Philipos C. Loizou:
A perceptually motivated subspace approach for speech enhancement. 1797-1800 - Gwo-hwa Ju, Lin-Shan Lee:
Speech enhancement based on generalized singular value decomposition approach. 1801-1804 - Jong Uk Kim, Chang D. Yoo:
Subspace speech enhancement using subband whitening filter. 1805-1808 - Sungwook Chang, Sung-il Jung, Younghun Kwon, Sung-il Yang:
Speech enhancement using wavelet packet transform. 1809-1812 - Li Deng, Jasha Droppo, Alex Acero:
Sequential MAP noise estimation and a phase-sensitive model of the acoustic environment. 1813-1816 - Kazuhiro Nakadai, Hiroshi G. Okuno, Hiroaki Kitano:
Auditory fovea based speech enhancement and its application to human-robot dialog system. 1817-1820 - Erik M. Visser, Manabu Otsuka, Te-Won Lee:
A spatio-temporal speech enhancement scheme for robust speech recognition. 1821-1824 - Frédéric Berthommier, Seungjin Choi:
Comparative evaluation of CASA and BSS models for subband cocktail-party speech separation. 1825-1828 - Hyoung-Gook Kim, Dietmar Ruwisch:
Speech enhancement in non-stationary noise environments. 1829-1832 - Mitsunori Mizumachi, Satoshi Nakamura:
The 2ch hybrid subtractive beamformer applied to line sound sources. 1833-1836
Speech Recognition: In-Vehicle
- Umit H. Yapanel, Xianxian Zhang, John H. L. Hansen:
High performance digit recognition in real car environments. 793-796 - Tetsuya Shinde, Kazuya Takeda, Fumitada Itakura:
Multiple regression of log-spectra for in-car speech recognition. 797-800 - Yifan Gong, Lorin Netsch:
Experiments on speaker-independent voice command recognition using in-vehicle hands free speech. 801-804 - Shubha Kadambe:
Application of over-complete blind source separation for robust automatic speech recognition. 805-808 - Françoise Beaufays, Daniel Boies, Mitch Weintraub:
Porting channel robustness across languages. 809-812
Mechanisms for Dialogue Processing
- Yasuhiro Takahashi, Kohji Dohsaka, Kiyoaki Aikawa:
An efficient dialogue control method using decision tree-based estimation of out-of-vocabulary word attributes. 813-816 - Jerome R. Bellegarda:
Semantic inference: a data-driven solution for NL interaction. 817-820 - Jerry H. Wright, Alicia Abella, Allen L. Gorin:
Unified task knowledge for spoken language understanding and dialog management. 821-824 - Yun-Tien Lee, Cheng-Huang Wu, Yumin Lee, Lin-Shan Lee:
Distributed Chinese keyword spotting and verification for spoken dialogues under wireless environment. 825-828 - Ryuichiro Higashinaka, Noboru Miyazaki, Mikio Nakano, Kiyoaki Aikawa:
A method for evaluating incremental utterance understanding in spoken dialogue systems. 829-832 - Naoko Kakutani, Norihide Kitaoka, Seiichi Nakagawa:
Detection and recognition of repaired speech on misrecognized utterances for speech input of car navigation system. 833-836 - Robert Eklund:
Ingressive speech as an indication that humans are talking to humans (and not to machines). 837-840 - Hagen Soltau, Florian Metze, Alex Waibel:
Compensating for hyperarticulation by modeling articulatory properties. 841-844 - Olga Goubanova:
Forms of introduction in map task dialogues: case of L2 Russian speakers. 845-848 - Nanette Veilleux:
Bridges: regions between discourse segments. 849-852 - Didier Guillevic, Simona Gandrabur, Yves Normandin:
Robust semantic confidence scoring. 853-856 - Ludek Müller, Tomás Bartos:
Statistically based approach to rejection of incorrectly recognized words. 857-860 - Ryo Sato, Ryuichiro Higashinaka, Masafumi Tamoto, Mikio Nakano, Kiyoaki Aikawa:
Learning decision trees to determine turn-taking by spoken dialogue systems. 861-864 - H. Hamimed, Géraldine Damnati:
Integration of phonetic length properties in the acoustic models of false starts and out-of-vocabulary words. 865-868 - Yibao Zhao, Guojun Zhou:
N-word-sequence frequency noise mitigation for SLM based on binomial distribution. 869-872 - Chul Min Lee, Shrikanth S. Narayanan, Roberto Pieraccini:
Combining acoustic and language information for emotion recognition. 873-876 - Kadri Hacioglu, Wayne H. Ward:
A figure of merit for the analysis of spoken dialog systems. 877-880
Language Modeling
- Tomoyosi Akiba, Katunobu Itou, Atsushi Fujii, Tetsuya Ishikawa:
Selective back-off smoothing for incorporating grammatical constraints into the n-gram language model. 881-884 - Imed Zitouni, Olivier Siohan, Hong-Kwang Jeff Kuo, Chin-Hui Lee:
Backoff hierarchical class n-gram language modelling for automatic speech recognition systems. 885-888 - Francis Picard, Dominique Boucher, Guy Lapalme:
Constructing small language models from grammars. 889-892 - Rong Zhang, Alexander I. Rudnicky:
Improve latent semantic analysis based language model by integrating multiple level knowledge. 893-896 - Elvira I. Sicilia-Garcia, Ji Ming, Francis Jack Smith:
Individual word language models and the frequency approach. 897-900 - Andreas Stolcke:
SRILM - an extensible language modeling toolkit. 901-904 - Edward W. D. Whittaker, Dietrich Klakow:
Efficient construction of long-range language models using log-linear interpolation. 905-908 - Anna Corazza:
Integration of two stochastic context-free grammars. 909-912 - Manny Rayner, Beth Ann Hockey, John Dowding:
Grammar specialisation meets language modelling. 913-916 - Jing Huang, Geoffrey Zweig:
Maximum entropy model for punctuation annotation from speech. 917-920 - Shinsuke Mori:
An automatic sentence boundary detector based on a structured language model. 921-924 - Genqing Wu, Fang Zheng, Wenhu Wu, Mingxing Xu, Ling Jin:
Improved katz smoothing for language modeling in speech recogniton. 925-928 - Renato de Mori, Yannick Estève, Christian Raymond:
On the use of structures in language models for dialogue. 929-932 - Hakan Erdogan, Ruhi Sarikaya, Yuqing Gao, Michael Picheny:
Semantic structured language models. 933-936
Prosody and Speech Recognition - I
- Keikichi Hirose, Nobuaki Minematsu, Makoto Terao:
Statistical language modeling with prosodic boundaries and its use for continuous speech recognition. 937-940 - Koji Iwano, Takahiro Seki, Sadaoki Furui:
Noise robust speech recognition using F0 contour extracted by hough transform. 941-944 - Farshad Almasganj, Farhad D. Dehnavi, Mahmood Bijankhan:
Sharing relative stress of cross-word syllables and lexical stress to spontaneous speech recognition. 945-948 - Don Baron, Elizabeth Shriberg, Andreas Stolcke:
Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues. 949-952 - Xuejing Sun:
Pitch accent prediction using ensemble machine learning. 953-956 - David Escudero Mancebo, César González Ferreras, Valentín Cardeñoso-Payo:
Quantitative evaluation of relevant prosodic factors for text-to-speech synthesis in Spanish. 1165-1168 - Nuttakorn Thubthong, Boonserm Kijsirikul, Sudaporn Luksaneeyanawin:
Tone recognition in Thai continuous speech based on coarticulaion, intonation and stress effects. 1169-1172 - Kazuyuki Takagi, Hajime Kubota, Kazuhiko Ozeki:
Combination of pause and F0 information in dependency analysis of Japanese sentences. 1173-1176 - Yasuo Horiuchi, Tomoko Ohsuga, Akira Ichikawa:
Estimating syntactic structure from F0 contour and pause duration in Japanese speech. 1177-1180 - Yoichi Yamashita, Akira Inoue:
Extraction of important sentences using F0 information for speech summarization. 1181-1184 - Tatsuya Kitamura, Kayo Itoh, Toshihiko Itoh, Shigeyoshi Kitazawa:
Influence of prosody, context, and word order in the identification of focus in Japanese dialogue. 1185-1188 - Atsuhiko Kai, Yukari Nonomura, Toshihiko Itoh, Tatsuhiro Konishi, Yukihiro Itoh:
Influence of different dialogue situations on user²s behavior in spoken corrections. 1189-1192 - Li-chiung Yang:
Interpreting meaning from context: modeling the prosody of discourse markers in speech. 1193-1196 - Katarina Bartkova, David Le Gac, Delphine Charlet, Denis Jouvet:
Prosodic parameter for speaker identification. 1197-1200 - Shigeyoshi Kitazawa, Toshihiko Itoh, Tatsuya Kitamura:
Juncture segmentation of Japanese prosodic unit based on the spectrographic features. 1201-1204
Pathology of Voice and Speech Production
- Jan G. Svec, Frantisek Sram:
Kymographic imaging of the vocal fold oscillations. 957-960 - Katalin Mády, Robert Sader, Alexander Zimmermann, Philip Hoole, Ambros Beer, Hans-Florian Zeilhofer, Ch. Hannig:
Assessment of consonant articulation in glossectomee speech by dynamic MRI. 961-964 - Alan Wrench, Fiona Gibbon, Alison M. McNeill, Sara Wood:
An EPG therapy protocol for remediation and assessment of articulation disorders. 965-968 - Rupal Patel:
How speakers with and without speech impairment mark the question statement contrast. 969-972 - Stephen A. Zahorian, A. Matthew Zimmer, Fansheng Meng:
Vowel classification for computer-based visual feedback for speech training for the hearing impaired. 973-976
Model Based Speech Processing I
- Paavo Alku, Tom Bäckström:
All-pole modeling of wide-band speech using weighted sum of the LSP polynomials. 977-980 - Jean Schoentgen:
Analysis and synthesis of the phonatory excitation signal by means of a pair of polynomial shaping functions. 981-984 - Taras K. Vintsiuk:
Optimal speech signal partition into one-quasiperiodical segments. 985-988 - Hugo Leonardo Rufiner, Luís F. Rocha, John Goddard Close:
Sparse and independent representations of speech signals based on parametric models. 989-992 - Keiichi Funaki:
Improvement of the ELS-based time-varying complex speech analysis. 993-996
Acoustic Modeling
- K. K. Chin, Philip C. Woodland:
Maximum mutual information training of hidden Markov models with vector linear predictors. 997-1000 - Jonathan E. Hamaker, Joseph Picone, Aravind Ganapathiraju:
A sparse modeling approach to speech recognition based on relevance vector machines. 1001-1004 - Ciprian Chelba, Rachel Morton:
Mutual information phone clustering for decision tree induction. 1005-1008 - Kevin S. Van Horn:
Rethinking derived acoustic features in speech recognition. 1009-1012 - Konstantin Markov, Satoshi Nakamura:
Modeling HMM state distributions with Bayesian networks. 1013-1016 - Stavros Tsakalidis, Vlasios Doumpiotis, William Byrne:
Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation. 2585-2588 - Kozo Okuda, Tatsuya Kawahara, Satoshi Nakamura:
Speaking rate compensation based on likelihood criterion in acoustic model training and decoding. 2589-2592 - Michiel Bacchiani:
Combining maximum likelihood and maximum a posteriori estimation for detailed acoustic modeling of context dependency. 2593-2596 - Jing Huang, Vaibhava Goel, Ramesh Gopinath, Brian Kingsbury, Peder A. Olsen, Karthik Visweswariah:
Large vocabulary conversational speech recognition with the extended maximum likelihood linear transformation (EMLLT) model. 2597-2600 - Jinsong Zhang, Satoshi Nakamura:
Modeling varying pauses to develop robust acoustic models for recognizing noisy conversational speech. 2601-2604 - Hwa Jeon Song, Hyung Soon Kim:
Improving phone-level discrimination in LDA with subphone-level classes. 2625-2628 - Zhijian Ou, Zuoying Wang:
A combined model of statics-dynamics of speech optimized using maximum mutual information. 2629-2632 - Nobutoshi Takahashi, Seiichi Nakagawa:
Syllable recognition using syllable-segment statistics and syllable-based HMM. 2633-2636 - Jan W. F. Thirion, Elizabeth C. Botha:
Recurrent neural network-enhanced HMM speech recognition systems. 2637-2640 - Young-Sun Yun:
Sharing trend information of trajectory in segmental-feature HMM. 2641-2644 - Jesper Salomon, Simon King, Miles Osborne:
Framewise phone classification using support vector machines. 2645-2648 - Darryl Stewart, Ming Ji, Philip Hanna, Francis Jack Smith:
A state-tying approach to building syllable HMMs. 2649-2652 - Weifeng Lee, C. Chandra Sekhar, Kazuya Takeda, Fumitada Itakura:
Recognition of continuous speech segments of monophone units using support vector machines. 2653-2656 - Junho Park, Hanseok Ko:
Construction of decision tree from data driven clustering. 2657-2660 - Akinobu Lee, Yuichiro Mera, Hiroshi Saruwatari, Kiyohiro Shikano:
Selective multi-path acoustic model based on database likelihoods. 2661-2664 - Todd A. Stephenson, Mathew Magimai-Doss, Hervé Bourlard:
Auxiliary variables in conditional Gaussian mixtures for automatic speech recognition. 2665-2668 - Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda:
Constructing shared-state hidden Markov models based on a Bayesian approach. 2669-2672 - Tetsuji Ogawa, Tetsunori Kobayashi:
Generalization of state-observation-dependency in partly hidden Markov models. 2673-2676
Phonetics
- John H. Esling:
Laryngoscopic analysis of tibetan chanting modes and their relationship to register in sino-tibetan. 1081-1084 - Kathleen Murray, Betina Simonsen:
A corpus-based study of danish laryngealization. 1085-1088 - Natasha Warner, Allard Jongman, Doris Mcke:
Variability in direction of dorsal movement during production of /l/. 1089-1092 - Yi Xu, Fang Liu:
Segmentation of glides with tonal alignment as reference. 1093-1096 - Ian Maddieson, Julie Larson:
Variability in the production of glottalized sonorants: data from yapese. 1097-1100 - Vu Ngoc Tuan, Christophe d'Alessandro, Sophie Rosset:
A phonetic study of vietnamese tones: acoustic and electroglottographic measurements. 1101-1104 - Hyunsong Chung:
Segment duration in spoken korean. 1105-1108 - Elena Zvonik, Fred Cummins:
Pause duration and variability in read texts. 1109-1112 - Hartmut R. Pfitzinger:
Intrinsic phone durations are speaker-specific. 1113-1116 - Mechtild Tronnier:
Preaspirated stops in southern Swedish. 1117-1120 - Natasha Warner, Andrea Weber:
Stop epenthesis at syllable boundaries. 1121-1124 - William D. Raymond, Mark A. Pitt, Keith Johnson, Elizabeth Hume, Matthew J. Makashay, Robin Dautricourt, Craig Hilts:
An analysis of transcription consistency in spontaneous speech from the buckeye corpus. 1125-1128 - Makiko Aoyagi:
Contextual effects on voicing judgment of stop consonants in Japanese. 1129-1132 - Akiyo Joto, Motohisa Imaishi, Yoshiki Nagase, Seiya Funatsu:
Discrimination of English vowels in consonantal contexts by native speakers of Japanese and its relations to dynamic information of formants. 1133-1136
Call Classification and Routing
- Gökhan Tür, Jerry H. Wright, Allen L. Gorin, Giuseppe Riccardi, Dilek Hakkani-Tür:
Improving spoken language understanding using word confusion networks. 1137-1140 - Li Li, Wu Chou:
Improving latent semantic indexing based classifier with information gain. 1141-1144 - Hong-Kwang Jeff Kuo, Chin-Hui Lee, Imed Zitouni, Eric Fosler-Lussier, Egbert Ammicht:
Discriminative training for call classification and routing. 1145-1148 - Stephen Cox:
Speech and language processing for a constrained speech translation system. 1149-1152 - Ananlada Chotimongkol, Alexander I. Rudnicky:
Automatic concept identification in goal-oriented conversations. 1153-1156 - Michael Levit, Elmar Nöth, Allen L. Gorin:
Using EM-trained string-edit distances for approximate matching of acoustic morphemes. 1157-1160 - Premkumar Natarajan, Rohit Prasad, Bernhard Suhm, Daniel McCarthy:
Speech-enabled natural language call routing: BBN call director. 1161-1164
Acoustic Speech Modeling
- Sheng Gao, Jinsong Zhang, Satoshi Nakamura, Chin-Hui Lee, Tat-Seng Chua:
Weighted graph based decision tree optimization for high accuracy acoustic modeling. 1233-1236 - Li Zhang, William H. Edmondson:
Speech recognition using syllable patterns. 1237-1240 - Janus D. Brink, Elizabeth C. Botha:
A comparison of L1 and african-mother-tongue acoustic models for south african English speech recognition. 1241-1244 - Panu Somervuo:
Speech modeling using variational Bayesian mixture of Gaussians. 1245-1248 - Tao Chen, Chao Huang, Eric Chang, Jingchun Wang:
On the use of Gaussian mixture model for speaker variability analysis. 1249-1252 - Philip J. B. Jackson, Martin J. Russell:
Models of speech dynamics in a segmental-HMM recognizer using intermediate linear representations. 1253-1256 - Heiga Zen, Keiichi Tokuda, Tadashi Kitamura:
Decision tree distribution tying based on a dimensional split technique. 1257-1260
Speech Synthesis: Alternative Views
- Mark A. Huckvale:
Speech synthesis, speech simulation and speech science. 1261-1264 - Murtaza Bulut, Shrikanth S. Narayanan, Ann K. Syrdal:
Expressive speech synthesis using a concatenative synthesizer. 1265-1268 - Kengo Shichiri, Atsushi Sawabe, Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura:
Eigenvoices for HMM-based speech synthesis. 1269-1272 - Erwin Marsi, Bertjan Busser, Walter Daelemans, Véronique Hoste, Martin Reynaert, Antal van den Bosch:
Combining information sources for memory-based pitch accent placement. 1273-1276 - Mary D. Swift, Ellen Campana, James F. Allen, Michael K. Tanenhaus:
Eye-fixation as a measure of real-time processing of synthesized words. 1277-1280 - Amanda Stent, Marilyn A. Walker, Steve Whittaker, Preetam Maloor:
User-tailored generation for spoken dialogue: an experiment. 1281-1284 - Deb Roy:
A system that learns to describe objects in visual scenes. 1285-1288
Finite State Transducers Applied to Spoken Language Processing
- Xiaolong Mou, Stephanie Seneff, Victor Zue:
Integration of supra-lexical linguistic models with speech recognition using shallow parsing and finite state transducers. 1289-1292 - Han Shu, I. Lee Hetherington:
EM training of finite-state transducers and its application to pronunciation modeling. 1293-1296 - Máté Szarvas, Sadaoki Furui:
Finite-state transducer based hungarian LVCSR with explicit modeling of phonological changes. 1297-1300 - Diamantino Caseiro, Isabel Trancoso:
Using dynamic WFST composition for recognizing broadcast news. 1301-1304 - Hans J. G. A. Dolfing:
Transducer search space modelings for large-vocabulary speech recognition. 1305-1308 - Stephan Kanthak, Hermann Ney, Michael Riley, Mehryar Mohri:
A comparison of two LVR search optimization techniques. 1309-1312 - Mehryar Mohri, Michael Riley:
An efficient algorithm for the n-best-strings problem. 1313-1316
Speaker Modeling and Scoring
- Bing Xiang, Toby Berger:
Structural Gaussian mixture models for efficient text-independent speaker verification. 1317-1320 - Adriano Petry, Dante Augusto Couto Barone:
Text-dependent speaker verification using lyapunov exponents. 1321-1324 - Mohamed Faouzi BenZeghiba, Hervé Bourlard:
User-customized password speaker verification based on HMM/ANN and GMM models. 1325-1328 - Dong Xin, Zhaohui Wu, Yingchun Yang:
Exploiting support vector machines in hidden Markov models for speaker verification. 1329-1332 - Yassine Mami, Delphine Charlet:
Speaker identification by location in an optimal space of anchor models. 1333-1336 - Alex Park, Timothy J. Hazen:
ASR dependent techniques for speaker identification. 1337-1340 - Peng Ding, Yang Liu, Bo Xu:
Factor analyzed Gaussian mixture models for speaker identification. 1341-1344 - Qin Jin, Tanja Schultz, Alex Waibel:
Phonetic speaker identification. 1345-1348 - Ming Liu, Eric Chang, Bei-qian Dai:
Hierarchical Gaussian mixture model for speaker verification. 1353-1356 - Greg Kochanski, Daniel P. Lopresti, Chilin Shih:
A reverse turing test using speech. 1357-1360 - Sungjoo Ahn, Sunmee Kang, Hanseok Ko:
On effective speaker verification based on subword model. 1361-1364 - Bing Xiang:
Speaker verification using Gaussian component strings in dynamic trajectory space. 1365-1368 - Larry P. Heck, Dominique Genoud:
Combining speaker and speech recognition systems. 1369-1372 - Qi Li, Hui Jiang, Qiru Zhou, Jinsong Zheng:
Automatic enrollment for speaker authentication. 1373-1376 - Marco Andorno, Pietro Laface, Roberto Gemello:
Experiments in confidence scoring for word and sentence verification. 1377-1380 - Mark C. Huggins, John J. Grieco:
Confidence metrics for speaker identification. 1381-1384 - Daniel Elenius, Mats Blomberg:
Characteristics of a low reject mode speaker verification system. 1385-1388
Issues in Audio-Visual Spoken Language Processing
- Lynne E. Bernstein, Denis Burnham, Jean-Luc Schwartz:
Special session: issues in audiovisual spoken language processing (when, where, and how?). 1445-1448 - Sabine Deligne, Gerasimos Potamianos, Chalapathy Neti:
Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization). 1449-1452 - Gérard Bailly:
Audiovisual speech synthesis. from ground truth to models. 1453-1456 - Eric Vatikiotis-Bateson, Harold Hill, Miyuki Kamachi, Karen Lander, Kevin G. Munhall:
The stimulus as basis for audiovisual integration. 1457-1460 - Lawrence D. Rosenblum:
The perceptual basis for audiovisual speech integration. 1461-1464 - Debra M. Hardison:
Sources of variability in the perceptual training of /r/ and /l/: interaction of adjacent vowel, word position, talkers² visual and acoustic cues. 1465-1468 - Valérie Hazan, Anke Sennema, Andrew Faulkner:
Audiovisual perception in L2 learners. 1685-1688 - Karen Iler Kirk, David B. Pisoni, Lorin Lachs:
Audiovisual integration of speech by children and adults with cochlear implants. 1689-1692 - Kaoru Sekiyama, Yoichi Sugita:
Auditory-visual speech perception examined by brain imaging and reaction time. 1693-1696 - Curtis W. Ponton, Edward T. Auer, Lynne E. Bernstein:
Neurocognitive basis for audiovisual speech perception: evidence from event-related potentials. 1697-1700 - David J. Lewkowicz:
Perception and integration of audiovisual speech in human infants. 1701-1704 - Gérard Bailly, Pierre Badin:
Seeing tongue movements from outside. 1913-1916 - Jacek C. Wojdel, Pascal Wiggers, Léon J. M. Rothkrantz:
An audio-visual corpus for multimodal speech recognition in dutch language. 1917-1920 - Pascal Wiggers, Jacek C. Wojdel, Léon J. M. Rothkrantz:
Medium vocabulary continuous audio-visual speech recognition. 1921-1924 - Martin Heckmann, Kristian Kroschel, Christophe Savariaux, Frédéric Berthommier:
DCT-based video features for audio-visual speech recognition. 1925-1928 - V. Dogu Erdener, Denis Burnham:
The effect of auditory-visual information and orthographic background in L2 acquisition. 1929-1932 - Emiel Krahmer, Zsófia Ruttkay, Marc Swerts, Wieger Wesselink:
Perceptual evaluation of audiovisual cues for prominence. 1933-1936 - Jean-Luc Schwartz, Frédéric Berthommier, Christophe Savariaux:
Audio-visual scene analysis: evidence for a "very-early" integration process in audio-visual speech perception. 1937-1940 - Milos Zelezný, Petr Císar, Zdenek Krnoul, Jan Novák:
Design of an audio-visual speech corpus for the czech audio-visual speech synthesis. 1941-1944 - Virginie Attina, Denis Beautemps, Marie-Agnès Cathiard:
Coordination of hand and orofacial movements for CV sequences in French cued speech. 1945-1948 - Virginie Attina, Marie-Agnès Cathiard, Denis Beautemps:
Controling anticipatory behavior for rounding in French cued speech. 1949-1952 - David Sodoyer, Laurent Girin, Christian Jutten, Jean-Luc Schwartz:
Audio-visual speech sources separation: a new approach exploiting the audio-visual coherence of speech stimuli. 1953-1956 - David House:
Intonational and visual cues in the perception of interrogative mode in Swedish. 1957-1960 - Simon Lucey, Sridha Sridharan, Vinod Chandran:
A link between cepstral shrinking and the weighted product rule in audio-visual speech recognition. 1961-1964
Speech Technology Applications
- Taku Endo, Nigel Ward, Minoru Terada:
Can confidence scores help users post-editing speech recognizer output? 1469-1472 - Masatoshi Watanabe, Masahide Sugiyama:
Information retrieval based on speech recognition results. 1473-1476 - Saija-Maaria Lemmelä, Péter Pál Boda:
Efficient combination of type-in and wizard-of-oz tests in speech interface development process. 1477-1480 - Wolfgang Macherey, Hans Jörg Viechtbauer, Hermann Ney:
Probabilistic retrieval based on document representations. 1481-1484 - Takuya Nishimoto, Masahiro Araki, Yasuhisa Niimi:
Radiodoc: a voice-accessible document system. 1485-1488 - Masataka Goto, Katunobu Itou, Satoru Hayamizu:
Speech completion: on-demand completion assistance using filled pauses for speech input interfaces. 1489-1492 - Jenny Wilkie, Mervyn A. Jack, Peter J. Littlewood:
Design of system-initiated digressive proposals for automated banking dialogues. 1493-1496 - Arthur R. Toth, Thomas K. Harris, James Sanders, Stefanie Shriver, Roni Rosenfeld:
Towards every-citizen²s speech interface: an application generator for speech interfaces to databases. 1497-1500 - Rukmini Iyer, Jeffrey Z. Ma, Herbert Gish, Owen Kimball:
Training topic classifiers for conversational speech with limited data. 1501-1504 - Hiromitsu Nishizaki, Seiichi Nakagawa:
Comparing isolately spoken keywords with spontaneously spoken queries for Japanese spoken document retrieval. 1505-1508 - Jennifer C. Lai, Kwan Min Lee:
Choosing speech or touchtone modality for navigation within a telephony natural language system. 1509-1512 - Wai Kit Lo, Helen M. Meng, P. C. Ching:
Multi-scale and multi-model integration for improved performance in Chinese spoken document retrieval. 1513-1516 - Kohichi Ogata, Yorinobu Sonoda:
Development of a GUI-based articulatory speech synthesis system. 1517-1520
Speech Production: Models and Physiology
- Jianwu Dang, Masaaki Honda, Kiyoshi Honda:
Investigation of coarticulation based on electromagnetic articulographic data. 1521-1524 - Takuya Niikawa, Takanori Ando, Masafumi Matsumura:
Frequency dependence of vocal-tract length. 1525-1528 - Shinji Maeda, Martine Toda, Andreas J. Carlen, Lyes Meftahi:
Functional modeling of face movements during speech. 1529-1532 - Takemi Mochida, Masaaki Honda, Kouki Hayashi, Toshiharu Kuwae, Kunihiro Tanahashi, Kazufumi Nishikawa, Atsuo Takanishi:
Control system for talking robot to replicate articulatory movement of natural speech. 1533-1536 - Donald S. Finan, Anne Smith, Michael Ho:
Feed the tiger: a method for evoking reliable jaw stretch reflexes in children. 1537-1540 - Tokihiko Kaburagi, Kohei Wakamiya, Masaaki Honda:
Three-dimensional electromagnetic articulograph based on a nonparametric representation of the magnetic field. 2297-2300 - Yves Laprie, Slim Ouni:
Introduction of constraints in an acoustic-to-articulatory inversion method based on a hypercubic articulatory table. 2301-2304 - Sadao Hiroya, Masaaki Honda:
Acoustic-to-articulatory inverse mapping using an HMM-based speech production model. 2305-2308 - Kiyoshi Hashimoto:
Modeling articulatory dynamics in autoregressive linear system. 2309-2312 - Denisse Sciamarella, Christophe d'Alessandro:
A study of the two-mass model in terms of acoustic parameters. 2313-2316
Tools for Spoken Language Resources
- Dorothea Kolossa, Qiang Huo:
Using time-stretched pulses for accurate splitting of speech utterances played back in noisy reverberant environments. 1541-1544 - Kikuo Maekawa, Hideaki Kikuchi, Yosuke Igarashi, Jennifer J. Venditti:
X-JToBI: an extended j-toBI for spontaneous speech. 1545-1548 - Helmer Strik, Walter Daelemans, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, Catia Cucchiarini:
Dutch HLT resources: from BLARK to priority lists. 1549-1552 - Fan Yang, Susan E. Strayer, Peter A. Heeman:
ACT: a graphical dialogue annotation comparison tool. 1553-1556 - Ha-Jin Yu, Jin Suk Kim:
A training prompts generation algorithm for connected spoken word recognition. 1557-1560
Speech Recognition - Practical Issues
- Etienne Cornu, Hamid Sheikhzadeh, Robert L. Brennan:
A low-resource, miniature implementation of the ETSI distributed speech recognition front-end. 1581-1584 - Sergey Astrov:
Memory space reduction for hidden Markov models in low-resource speech recognition systems. 1585-1588 - Xia Wang, Juha Iso-Sipilä:
Low complexity Mandarin speaker-independent isolated word recognition. 1589-1592 - Imre Kiss, Marcel Vasilache:
Low complexity techniques for embedded ASR systems. 1593-1596 - Klaus Reinhard, Jochen Junkawitsch, Andreas Kießling, Stefan Dobler:
Optimization of hidden Markov models for embedded systems. 1597-1600 - Karim Filali, Xiao Li, Jeff A. Bilmes:
Data-driven vector clustering for low-memory footprint ASR. 1601-1604 - Hui Jiang, Chin-Hui Lee:
Utterance verification based on neighborhood information and Bayes factors. 1605-1608 - Tommi Lahti, Janne Suontausta:
Vocabulary independent OOV detection using support vector machines. 1609-1612 - Issam Bazzi, James R. Glass:
A multi-class approach for modelling out-of-vocabulary words. 1613-1616 - Jacques Duchateau, Patrick Wambacq:
Unconstrained versus constrained acoustic normalisation in confidence scoring. 1617-1620 - Daniele Falavigna, Roberto Gretter, Giuseppe Riccardi:
Acoustic and word lattice based algorithms for confidence scores. 1621-1624 - Huei-Ming Wang, Yi-Chung Lin:
Error-tolerant spoken language understanding with confidence measuring. 1625-1628
Perception
- Shawn A. Weil:
Comparing intelligibility of several non-native accent classes in noise. 1629-1632 - Kentaro Ishizuka, Kiyoaki Aikawa:
Effect of F0 fluctuation and amplitude modulation of natural vowels on vowel identification in noisy environments. 1633-1636 - Kiyoko Yoneyama:
Similarities of words in noise in Japanese. 1637-1640 - Douglas Brungart, Alexander J. Kordik, Koel Das, Arnab K. Shaw:
The effects of F0 manipulation on the perceived distance of speech. 1641-1644 - Esther Janse:
Time-compressing natural and synthetic speech. 1645-1648 - Jianxia Xue, Sumiko Takayanagi, Lynne E. Bernstein:
Accounting for perceptual identification of consonants and vowels through acoustic dissimilarity. 1649-1652 - Travis Wade, Deborah K. Eakin, Russell Webb, Arvin Agah, Frank Brown, Allard Jongman, John Gauch, Thomas A. Schreiber, Joan A. Sereno:
Modeling recognition of speech sounds with minerva2. 1653-1656 - Ruth Kearns, Dennis Norris, Anne Cutler:
Syllable processing in English. 1657-1660 - Cecile T. L. Kuijpers, Wilma van Donselaar, Anne Cutler:
Perceptual effects of assimilation-induced violation of final devoicing in dutch. 1661-1664 - Michael C. W. Yip:
Access to homophonic meanings during spoken language comprehension: effects of context and neighborhood density. 1665-1668 - Ivan Magrin-Chagnolleau, Melissa Barkat, Fanny Meunier:
Intelligibility of reverse speech in French: a perceptual study. 1669-1672 - Willy Serniclaes, René Carré:
Contextual effects in the perception of fricative place of articulation: a rotational hypothesis. 1673-1676 - Rudolph Sock, Béatrice Vaxelaire, Véronique Hecker, Fabrice Hirsch:
What relationship between protrusion anticipation and auditory perception? 1677-1680 - René Carré, Jean-Sylvain Liénard, Egidio Marsico, Willy Serniclaes:
On the role of the "schwa" in the perception of plosive consonants. 1681-1684 - Noël Nguyen, Ludovic Jankowski, Michel Habib:
The perception of stop consonant sequences in dyslexic and normal children. 2565-2568 - Takashi Otake, Akemi Iijima:
Submoraic awareness by Japanese school children: evidence from a novel game. 2569-2572 - D. Markham, Valérie Hazan:
Speaker intelligibility of adults and children. 2573-2576 - Yasuki Yamashita, Hiroshi Matsumoto:
Acoustical correlates to SD ratings of speaker characteristics in two speaking styles. 2577-2580 - Eda Ormanci, U. Hakan Nikbay, Oytun Türk, Levent M. Arslan:
Subjective assessment of frequency bands for perception of speaker identity. 2581-2584
Speech to Speech Translation - I
- David Stallard, Premkumar Natarajan, Mohammed Noamany, Richard M. Schwartz, John Makhoul:
Design for a speech-to-speech translator for field use. 1705-1708 - Alan W. Black, Ralf D. Brown, Robert E. Frederking, Kevin A. Lenzo, John Moody, Alexander I. Rudnicky, Rita Singh, Eric Steinbrecher:
Rapid development of speech-to-speech translation systems. 1709-1712 - Kenji Imamura, Eiichiro Sumita:
Bilingual corpus cleaning focusing on translation literality. 1713-1716 - Hideki Tanaka, Stephen Nightingale, Hideki Kashioka, Kenji Matsumoto, Masamchi Nishiwaki, Tadashi Kumano, Takehiko Maruyama:
Speech to speech translation system for monologues-data driven approach. 1717-1720 - Adrià de Gispert, José B. Mariño:
Using x-grams for speech-to-speech translation. 1885-1888 - Taro Watanabe, Eiichiro Sumita:
Statistical machine translation decoder based on phrase. 1889-1892 - Eiichiro Sumita, Yasuhiro Akiba, Kenji Imamura:
Reliability measures for translation quality. 1893-1896 - Bowen Zhou, Yuqing Gao, Jeffrey S. Sorensen, Zijian Diao, Michael Picheny:
Statistical natural language generation for speech-to-speech machine translation systems. 1897-1900 - Stephan Vogel, Alicia Tribble:
Improving statistical machine translation for a speech-to-speech translation task. 1901-1904 - Solange Rossato, Hervé Blanchon, Laurent Besacier:
Speech-to-speech translation system evaluation: results for French for the NESPOLE! project first showcase. 1905-1908 - Manuel Kauers, Stephan Vogel, Christian Fügen, Alex Waibel:
Interlingua based statistical machine translation. 1909-1912
Speech Processing
- Nobuyuki Nishizawa, Keikichi Hirose, Nobuaki Minematsu:
Separation of voiced source characteristics and vocal tract transfer function characteristics for speech sounds by iterative analysis based on AR-HMM model. 1721-1724 - Shuichi Narusawa, Nobuaki Minematsu, Keikichi Hirose, Hiroya Fujisaki:
Automatic extraction of model parameters from fundamental frequency contours of English utterances. 1725-1728 - Takahiro Murakami, Munehiro Namba, Tetsuya Hoya, Yoshihisa Ishida:
Pitch extraction of speech signals using an eigen-based subspace method. 1729-1732 - Tomohiro Nakatani, Toshio Irino:
Robust fundamental frequency estimation against background noise and spectral distortion. 1733-1736 - Thomas F. Quatieri:
2-d processing of speech with application to pitch estimation. 1737-1740
Speech Recognition: Broadcast and Courtroom Transcription
- Murat Saraclar, Michael Riley, Enrico Bocchieri, Vincent Goffin:
Towards automatic closed captioning : low latency real time broadcast news transcription. 1741-1744 - Rohit Prasad, Long Nguyen, Richard M. Schwartz, John Makhoul:
Automatic transcription of courtroom speech. 1745-1748 - Long Nguyen, Xuefeng Guo, Richard M. Schwartz, John Makhoul:
Japanese broadcast news transcription. 1749-1752 - Robert Hecht, Jürgen Riedler, Gerhard Backfried:
German broadcast news transcription. 1753-1756 - Toru Imai, Atsushi Matsui, Shinichi Homma, Takeshi Kobayakawa, Kazuo Onoe, Shoei Sato, Akio Ando:
Speech recognition with a re-speak method for subtitling live broadcasts. 1757-1760
Duration, Tempo, and Intonation
- Keiichi Takamaru, Makoto Hiroshige, Kenji Araki, Koji Tochinai:
Evaluation of the method to detect Japanese local speech rate deceleration applying the variable threshold with a constant term. 1761-1764 - Sandra P. Kirkham:
Tempo modulations in English: selected pilot study results. 1765-1768 - Caroline L. Smith:
Modeling durational variability in reading aloud a connected text. 1769-1772 - Yasser Hifny, Mohsen A. Rashwan:
Duration modeling for arabic text to speech synthesis. 1773-1776 - Oliver Jokisch, Hongwei Ding, Hans Kruschke, Guntram Strecha:
Learning syllable duration and intonation of Mandarin Chinese. 1777-1780
Speech Coding and Transmission
- Pushkar Patwardhan, Preeti Rao:
Controlling perceived degradation in spectrum envelope modeling via predistortion. 1837-1840 - Peter Veprek, Alan B. Bradley:
Benefit and cost analysis of using the improved vector quantizer design algorithm for glottal source waveform compression. 1841-1844 - Xin Zhong, Jon A. Arrowood, Mark A. Clements:
Speech coding and transmission for improved automatic recognition. 1845-1848 - Phu Chien Nguyen, Takao Ochi, Masato Akagi:
Coding speech at very low rates using straight and temporal decomposition. 1849-1852 - Toni P. Nieminen:
Floating-point adaptive multi-rate wideband speech codec. 1853-1856 - Omar Halmi, Hesham Tolba, Driss Guerchi, Douglas D. O'Shaughnessy:
On improving the performance of analysis-by-synthesis coding using a multi-magnitude algebraic code-book excitation signal. 1857-1860 - K. Humphreys, Robert Lawlor:
Improved performance speech codec for mobile communications. 1861-1864 - Evgeni Yakhnich, Yuval Bistritz:
Fixed-length segment coding of LSF parameters. 1865-1868 - Vijay Parsa, Donald G. Jamieson:
Interaction of voice over internet protocol speech coders and disordered speech samples. 1869-1872 - Holly Kelleher, David Pearce, Douglas Ealey, Laurent Mauuary:
Speech recognition performance comparison between DSR and AMR transcoded speech. 1873-1876 - Hans-Günter Hirsch:
The influence of speech coding on recognition performance in telecommunication networks. 1877-1880 - Gautam Moharir, Pushkar Patwardhan, Preeti Rao:
Spectral enhancement preprocessing for the HNM coding of noisy speech. 1881-1884
Spoken Document Retrieval
- Armelle Brun, Kamel Smaïli, Jean Paul Haton:
Contribution to topic identification by using word similarity. 1965-1968 - Bowen Zhou, John H. L. Hansen:
Speechfind: an experimental on-line spoken document retrieval system for historical audio archives. 1969-1972 - Yoshimi Suzuki, Fumiyo Fukumoto, Yoshihiro Sekiguchi:
Topic tracking using subject templates. 1973-1976 - Katsushi Asami, Toshiyuki Takezawa, Gen-ichiro Kikui:
Topic detection of an utterance for speech dialogue processing. 1977-1980 - Daben Liu, Jeffrey Ma, Dongxin Xu, Amit Srivastava, Francis Kubala:
Real-time rich-content transcription of Chinese broadcast news. 1981-1984 - Chun-Jen Wang, Berlin Chen, Lin-Shan Lee:
Improved Chinese spoken document retrieval with hybrid modeling and data-driven indexing features. 1985-1988 - Martha A. Larson, Stefan Eickeler, Gerhard Paaß, Edda Leopold, Jörg Kindermann:
Exploring sub-word features and linear support vector machines for German spoken document classification. 1989-1992 - Mirjam Wester, Judith M. Kessens, Helmer Strik:
Goal-directed ASR in a multimedia indexing and searching environment (MUMIS). 1993-1996 - Beth Logan, Jean-Manuel Van Thong:
Confusion-based query expansion for OOV words in spoken document retrieval. 1997-2000 - J. T. Wickramaratna, Philip C. Woodland:
Cluster identification for speaker-environment tracking. 2001-2004 - Julien Pinquier, Jean-Luc Rouas, Régine André-Obrecht:
Robust speech / music classification in audio documents. 2005-2008 - Stefan Karnebäck:
Expanded examinations of a low frequency modulation feature for speech/music discrimination. 2009-2012 - Hassan Ezzaidi, Jean Rouat:
Speech, music and songs discrimination in the context of handsets variability. 2013-2016
Acoustic Correlates and Recognition of Emotion
- Klaus R. Scherer, Didier Grandjean, Tom Johnstone, Gudrun Klasmeyer, Thomas Bänziger:
Acoustic correlates of task load and stress. 2017-2020 - Mandar A. Rahurkar, John H. L. Hansen, James Meyerhoff, George Saviolakis, Michael Koenig:
Frequency band analysis for stress detection using a teager energy operator based feature. 2021-2024 - Jiahong Yuan, Liqin Shen, Fangxin Chen:
The acoustic realization of anger, fear, joy and sadness in Chinese. 2025-2028 - Raquel Tato, Rocío Santos, Ralf Kompe, José M. Pardo:
Emotional space improves emotion recognition. 2029-2032 - Ze-Jing Chuang, Chung-Hsien Wu:
Emotion recognition from textual input using an emotional semantic network. 2033-2036 - Jeremy Ang, Rajdip Dhillon, Ashley Krupski, Elizabeth Shriberg, Andreas Stolcke:
Prosody-based automatic detection of annoyance and frustration in human-computer dialog. 2037-2040 - Veronika Makarova, Valery A. Petrushin:
RUSLANA: a database of Russian emotional utterances. 2041-2044
Dialog Strategy Design
- Ian M. O'Neill, Michael F. McTear:
A pragmatic confirmation mechanism for an object-based spoken dialogue manager. 2045-2048 - Sunna Torge, Stefan Rapp, Ralf Kompe:
Serving complex user wishes with an enhanced spoken dialogue system. 2049-2052 - Grace Chung, Stephanie Seneff:
Integrating speech with keypad input for automatic entry of spelling and pronunciation of new words. 2053-2056 - Ellen Campana, Sarah Brown-Schmidt, Michael K. Tanenhaus:
Reference resolution by human partners in a natural interactive problem-solving task. 2057-2060 - Luciana Ferrer, Elizabeth Shriberg, Andreas Stolcke:
Is the speaker done yet? faster and more accurate end-of-utterance detection using prosody. 2061-2064 - Genevieve Gorrell, Ian Lewin, Manny Rayner:
Adding intelligent help to mixed-initiative spoken dialogue systems. 2065-2068 - JongHo Shin, Shrikanth S. Narayanan, Laurie Gerber, Abe Kazemzadeh, Dani Byrd:
Analysis of user behavior under error conditions in spoken dialogs. 2069-2072
Speech Synthesis - Prosody
- Yinglong Jiang, Peter Murphy:
Production based pitch modification of voiced speech. 2073-2076 - Xuejing Sun:
F0 generation for speech synthesis using a multi-tier approach. 2077-2080 - Volker Strom:
From text to prosody without toBI. 2081-2084 - Keikichi Hirose, Masaya Eto, Nobuaki Minematsu:
Improved corpus-based synthesis of fundamental frequency contours using generation process model. 2085-2088 - Jeska Buhmann, Jean-Pierre Martens, Lieve Macken, Bert Van Coile:
Intonation modelling for the synthesis of structured documents. 2089-2092 - Joram Meron:
Applying fallback to prosodic unit selection from a small imitation database. 2093-2096 - Jianhua Tao, Lianhong Cai:
Clustering and feature learning based F0 prediction for Chinese speech synthesis. 2097-2100
Speech Features
- Katrin Weber, Febe de Wet, Bert Cranen, Lou Boves, Samy Bengio, Hervé Bourlard:
Evaluation of formant-like features for ASR. 2101-2104 - Fadhil H. T. Al-Dulaimy, Zuoying Wang:
Entropy of energy operator as feature for large vocabulary Mandarin speaker independent speech recognition. 2105-2108 - Yiyan Zhang, Wenju Liu, Bo Xu, Huayun Zhang:
Improving parametric trajectory modeling by integration of pitch and tone information. 2109-2112 - Hesham Tolba, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for automatic speech recognition using a multi-stream paradigm. 2113-2116 - Ka-Yee Leung, Man-Hung Siu:
Speech recognition using combined acoustic and articulatory information with retraining of acoustic model parameters. 2117-2120 - N. J. Wilkinson, Martin J. Russell:
Improved phone recognition on TIMIT using formant frequency data and confidence measures. 2121-2124 - Norihide Kitaoka, Daisuke Yamada, Seiichi Nakagawa:
Speaker independent speech recognition using features based on glottal sound source. 2125-2128 - Mohamed Kamal Omar, Ken Chen, Mark Hasegawa-Johnson, Yigal Brandman:
An evaluation of using mutual information for selection of acoustic-features representation of phonemes for speech recognition. 2129-2132 - Florian Metze, Alex Waibel:
A flexible stream architecture for ASR using articulatory features. 2133-2136 - Andrej Ljolje:
Speech recognition using fundamental frequency and voicing in acoustic modeling. 2137-2140 - Montri Karnjanadecha, Patimakorn Kimsawad:
A comparison of front-end analyses for Thai speech recognition. 2141-2144 - Jari Juhani Turunen, Juha T. Tanttu, Pekka Loula:
New model for speech residual signal shaping with static nonlinearity. 2145-2148 - Ching-Hsiang Ho, Dimitrios Rentzos, Saeed Vaseghi:
Formant model estimation and transformation for voice morphing. 2149-2152 - Beáta Megyesi, Sofia Gustafson-Capková:
Production and perception of pauses and their linguistic context in read and spontaneous speech in Swedish. 2153-2156 - Claudia Manfredi, Lorenzo Matassini:
Non-linear techniques for dysphonic voice analysis and correction. 2157-2160 - Akira Sasou, Kazuyo Tanaka:
Adaptive estimation of time-varying features from high-pitched speech based on an excitation source HMM. 2161-2164 - Martine Toda, Shinji Maeda, Andreas J. Carlen, Lyes Meftahi:
Lip gestures in English sibilants: articulatory - acoustic relationship. 2165-2168 - Naren Malayath, Hynek Hermansky:
Bark resolution from speech data. 2169-2172
Special Topics in Robust Speech Recognition
- Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
Noise-robust speech recognition in car environments using genetic algorithms and a mel-cepstral subspace approach. 2173-2176 - Scott Axelrod, Ramesh Gopinath, Peder A. Olsen:
Modeling with a subspace constraint on inverse covariance matrices. 2177-2180 - Iain McCowan, Andrew C. Morris, Hervé Bourlard:
Improving speech recognition performance of small microphone arrays using missing data techniques. 2181-2184 - David Gelbart, Nelson Morgan:
Double the trouble: handling noise and reverberation in far-field automatic speech recognition. 2185-2188 - Laurent Couvreur, Christophe Ris:
Model-based independent component analysis for robust multi-microphone automatic speech recognition. 2189-2192 - An-Tze Yu, Hsiao-Chuan Wang:
Compensation of channel effect on line spectrum frequencies. 2193-2196 - Huayun Zhang, Zhaobing Han, Bo Xu:
Codebook dependent dynamic channel estimation for Mandarin speech recognition over telephone. 2197-2200 - Roberto Gemello, Franco Mana, Paolo Pegoraro, Renato de Mori:
Robust multiple resolution analysis for automatic speech recognition. 2201-2204 - Antonio M. Peinado, Victoria E. Sánchez, José L. Pérez-Córdoba, José C. Segura, Antonio J. Rubio:
HMM-based methods for channel error mitigation in distributed speech recognition. 2205-2208 - Tim Fingscheidt, Stefanie Aalburg, Sorel Stan, Christophe Beaugeant:
Network-based vs. distributed speech recognition in adaptive multi-rate wireless systems. 2209-2212 - Alexis Bernard, Abeer Alwan:
Channel noise robustness for low-bitrate remote speech recognition. 2213-2216 - Carmen Peláez-Moreno, Ascensión Gallardo-Antolín, Jesús Vicente-Peña, Fernando Díaz-de-María:
Influence of transmission errors on ASR systems. 2217-2220 - Satoru Tsuge, Shingo Kuroiwa, Masami Shishibori, Fuji Ren, Kenji Kita:
Robust feature extraction in a variety of input devices on the basis of ETSI standard DSR front-end. 2221-2224 - Zheng-Hua Tan, Paul Dalsgaard:
Channel error protection scheme for distributed speech recognition. 2225-2228 - Yeshwant K. Muthusamy, Yifan Gong, Roshan Gupta:
The effects of speech compression on speech recognition and text-to-speech synthesis. 2229-2232 - Ben Milner, Xu Shao:
Transform-based feature vector compression for distributed speech recognition. 2233-2236
Distributed Multimodal Dialog Management Using Internet Technologies - I
- Michael Johnston, Srinivas Bangalore, Amanda Stent, Gunaranjan Vasireddy, Patrick Ehlen:
Multimodal language processing for mobile information access. 2237-2240 - Kuansan Wang:
SALT: a spoken language interface for web-based multimodal dialog systems. 2241-2244 - Christina L. Bennett, Ariadna Font Llitjós, Stefanie Shriver, Alexander I. Rudnicky, Alan W. Black:
Building voiceXML-based applications. 2245-2248 - Joyce Yue Chai:
Operations for context-based multimodal interpretation in conversational systems. 2249-2252 - Feng Liu, Antoine Saad, Li Li, Wu Chou:
A distributed multimodal dialogue system based on dialogue system and web convergence. 2253-2256 - Kouichi Katsurada, Yoshihiko Ootani, Yusaku Nakamura, Satoshi Kobayashi, Hirobumi Yamada, Tsuneo Nitta:
A modality-independent MMI system architecture. 2549-2552 - Cristiana Armaroli, Ivano Azzini, Lorenza Ferrario, Toni Giorgino, Luca Nardelli, Marco Orlandi, Carla Rognoni:
An architecture for a multi-modal web browser. 2553-2556 - Patrick Ehlen, Michael Johnston, Gunaranjan Vasireddy:
Collecting mobile multimodal data for match. 2557-2560 - Helen M. Meng, P. C. Ching, Yee Fong Wong, Cheong Chat Chan:
ISIS: a multi-modal, trilingual, distributed spoken dialog system developed with CORBA, java, XML and KQML. 2561-2564
Phonology
- Kimiko Tsukada:
An acoustic comparison between american English and australian English vowels. 2257-2260 - Luis M. T. Jesus, Christine H. Shadle:
A case study of portuguese and English bilinguality. 2261-2264 - Olga I. Dioubina, Hartmut R. Pfitzinger:
An IPA vowel diagram approach to analysing L1 effects on vowel production and perception. 2265-2268 - Pétur Helgason, Sjrðhur Gullbein:
Phonological norms in faroese speech synthesis. 2269-2272 - Philippe Boula de Mareüil, Martine Adda-Decker:
Studying pronunciation variants in French by using alignment techniques. 2273-2276 - Petra Hansson:
Perceived boundary strength. 2277-2280 - Sun-Ah Jun:
Syntax over focus. 2281-2284 - John J. Ohala, Rungpat Roengpitya:
Duration related phase realignment of Thai tones. 2285-2288 - Louis ten Bosch:
Probabilistic ranking of constraints. 2289-2292 - Masahiko Komatsu, Shinichi Tokuma, Won Tokuma, Takayuki Arai:
Multi-dimensional analysis of sonority: perception, acoustics, and phonology. 2293-2296
Feature Extraction for Speaker Recognition
- Marcos Faúndez-Zanuy, Mattias Nilsson, W. Bastiaan Kleijn:
On the relevance of bandwidth extension for speaker verification. 2317-2320 - Bogdan Sabac:
Speaker recognition using discriminative features selection. 2321-2324 - Tomi Kinnunen:
Designing a speaker-discriminative adaptive filter bank for speaker recognition. 2325-2328 - Chi-Leung Tsang, Man-Wai Mak, Sun-Yuan Kung:
Divergence-based out-of-class rejection for telephone handset identification. 2329-2332 - Purdy Ho:
A handset identifier using support vector machines. 2333-2336
Issues in Speech Recognition
- Robert Faltlhauser, Günther Ruske, Matthias Thomae:
Towards the question: why has speaking rate such an impact on speech recognition performance? 2429-2432 - Mijail Arcienega, Andrzej Drygajlo:
Robust voiced-unvoiced decision associated to continuous pitch tracking in noisy telephone speech. 2433-2436 - Kaisheng Yao, Kuldip K. Paliwal, Satoshi Nakamura:
Noise adaptive speech recognition with acoustic models trained from noisy speech evaluated on Aurora-2 database. 2437-2440 - Jingdong Chen, Yiteng Huang, Qi Li, Frank K. Soong:
Recognition of noisy speech using normalized moments. 2441-2444 - Chia-Ping Chen, Jeff A. Bilmes, Katrin Kirchhoff:
Low-resource noise-robust feature post-processing on Aurora 2.0. 2445-2448 - Li Deng, Jasha Droppo, Alex Acero:
Exploiting variances in robust feature extraction based on a parametric model of speech distortion. 2449-2452 - Muhammad Ghulam, Takashi Fukuda, Takaharu Sato, Tsuneo Nitta:
Improving performance of an HMM-based ASR system by using monophone-level normalized confidence measure. 2453-2456 - Yi Liu, Pascale Fung:
Model partial pronunciation variations for spontaneous Mandarin speech recognition. 2457-2460 - Fang Zheng, Zhanjiang Song, Pascale Fung, William Byrne:
Reducing pronunciation lexicon confusion and using more data without phonetic transcription for pronunciation modeling. 2461-2464 - Erik McDermott, Shigeru Katagiri:
Classification error from the theoretical Bayes classification risk. 2465-2468 - Aldebaro Klautau, Nikola Jevtic, Alon Orlitsky:
Combined binary classifiers with applications to speech recognition. 2469-2472 - Arkadiusz Nagórski, Lou Boves, Herman J. M. Steeneken:
Optimal selection of speech data for automatic speech recognition systems. 2473-2476
Speech Pathology Processing and Treatment
- Mario Liotti, Lorraine O. Ramig, Deanie Vogel, Pamela New, Chris Cook, Peter Fox:
Hypophonia in parkinson disease: neural correlates of voice treatment with LSVT revealed by PET. 2477-2480 - Susan Duncan:
Preliminary data on effects of behavioral and levodopa therapies on speech-accompanying gesture in parkinson²s disease. 2481-2484 - Francis K. H. Quek, Mary P. Harper, Yonca Haciahmetoglu, Lei Chen, Lorraine O. Ramig:
Speech pauses and gestural holds in parkinson²s disease. 2485-2488 - Jennifer L. Spielman, Lorraine O. Ramig, Joan C. Borod:
Oro-facial changes in parkinson²s disease following intensive voice therapy (LSVT). 2489-2492 - Jeri Logemann, Ralph Sundin, Jean Sundin:
Swallowing and voice effects of lee silverman voice treatment (LSVT). 2493-2496
Speech Pathology Processing
- Leslie Will, Lorraine O. Ramig, Jennifer L. Spielman:
Application of the lee silverman voice treatment (LSVT) to individuals with multiple sclerosis, ataxic dysarthria, and stroke. 2497-2500 - Becky G. Farley:
Think big, from voice to limb movement therapy. 2501-2504 - Vijay Parsa, Donald G. Jamieson, Karen Stenning, Herbert A. Leeper:
On the estimation of signal-to-noise ratio in continuous speech for abnormal voices. 2505-2508
Applications of Speech Signal Processing
- Vasyl Semenov, Alexander Kovtonyuk, Alexander Kalyuzhny:
Computationally efficient method of speech enhancement based on block representation of signal in state space and vector quantization. 2509-2512 - Kazuhiro Kondo, Kiyoshi Nakagawa:
Active speech cancellation for cellular speech. 2513-2516 - R. Muralishankar, A. G. Ramakrishnan, P. Prathibha:
Warped-LP residual resampling using DCT for pitch modification. 2517-2520 - E. Jung, A. Th. Schwarzbacher, K. Humphreys, Robert Lawlor:
Application of real-time AMDF pitch-detection in a voice gender normalisation system. 2521-2524 - Yves Laprie, Anne Bonneau:
A copy synthesis method to pilot the klatt synthesiser. 2525-2528 - Masaharu Sakamoto, Takashi Saito:
Speaker recognizability evaluation of a voicefont-based text-to-speech system. 2529-2532 - Antonio Satué-Villar, Juan Fernández-Rubio:
Time-frequency transforms and beamforming for speaker recognition. 2533-2536 - Soonil Kwon, Shrikanth S. Narayanan:
Speaker change detection using a new weighted distance measure. 2537-2540 - José Luis Gómez-Cipriano, Roger Pizzatto Nunes, Dante A. C. Barone:
FPGA hardware for speech recognition using hidden Markov models. 2541-2544 - Toshio Irino, Yasuhiro Minami, Tomohiro Nakatani, Minoru Tsuzaki, H. Tagawa:
Evaluation of a speech recognition / generation method based on HMM and straight. 2545-2548
Speech Synthesis: Unit Selection
- Jithendra Vepa, Simon King, Paul Taylor:
Objective distance measures for spectral discontinuities in concatenative speech synthesis. 2605-2608 - Wael Hamza, Robert E. Donovan:
Data-driven segment preselection in the IBM trainable speech synthesis system. 2609-2612 - Hu Peng, Yong Zhao, Min Chu:
Perpetually optimizing the cost function for unit selection in a TTS system with one single run of MOS evaluation. 2613-2616 - Jon R. W. Yi, James R. Glass:
Information-theoretic criteria for unit selection synthesis. 2617-2620 - Hisashi Kawai, Minoru Tsuzaki:
Acoustic measures vs. phonetic features as predictors of audible discontinuity in concatenative speech synthesis. 2621-2624
Dialog Systems and Applications
- Hsien-Chang Wang, Chieh-Yi Huang, Chung-Hsien Yang, Jhing-Fa Wang:
A study of multi-speaker dialogue system for mobile information retrieval. 2677-2680 - Giuseppe Di Fabbrizio, Dawn Dutton, Narendra K. Gupta, Barbara Hollister, Mazin G. Rahim, Giuseppe Riccardi, Robert E. Schapire, Juergen Schroeter:
AT&t help desk. 2681-2684 - Roger Trias-Sanz, José B. Mariño:
Basurde[lite], a machine-driven dialogue system for accessing railway timetable information. 2685-2688 - Rachel Coulston, Sharon L. Oviatt, Courtney Darves:
Amplitude convergence in children²s conversational speech with animated personas. 2689-2692 - David Stallard:
Flexible dialogue management in the talk'n'travel system. 2693-2696 - Daniela Oria, Esa Koskinen:
E-mail goes mobile: the design and implementation of a spoken language interface to e-mail. 2697-2700 - Néstor Becerra Yoma, Angela Cortés, Mauricio Hormazábal, Enrique López:
Wizard of oz evaluation of a dialogue with communicator system in Chile. 2701-2704 - Bob Carpenter, Sasha Caskey, Krishna Dayanidhi, Caroline Drouin, Roberto Pieraccini:
A portable, server-side dialog framework for voiceXML. 2705-2708 - Shinya Takahashi, Tsuyoshi Morimoto, Sakashi Maeda, Naoyuki Tsuruta:
Spoken dialogue system for home health care. 2709-2712 - Jaume Padrell, Javier Hernando:
ACIMET: access to meteorological information by telephone. 2713-2716 - Ralf Engel:
SPIN: language understanding for spoken dialogue systems using a production system approach. 2717-2720
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.