A Hybrid GMM and Codebook Mapping Method for Spectral Conversion

Kang, Yongguo; Shuang, Zhiwei; Tao, Jianhua; Zhang, Wei; Xu, Bo

doi:10.1007/11573548_39

Yongguo Kang¹⁹,
Zhiwei Shuang²⁰,
Jianhua Tao¹⁹,
Wei Zhang²⁰ &
…
Bo Xu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Included in the following conference series:

International Conference on Affective Computing and Intelligent Interaction

5416 Accesses
7 Citations

Abstract

This paper proposes a new mapping method combining GMM and codebook mapping methods to transform spectral envelope for voice conversion system. After analyzing overly smoothing problem of GMM mapping method in detail, we propose to convert the basic spectral envelope by GMM method and convert envelope-subtracted spectral details by GMM and phone-tied codebook mapping method. Objective evaluations based on performance indices show that the performance of proposed mapping method averagely improves 27.2017% than GMM mapping method, and listening tests prove that the proposed method can effectively reduce over smoothing problem of GMM method while it can avoid the discontinuity problem of codebook mapping method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 17159; Price includes VAT (Japan)

Softcover Book: JPY 21449; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Grid-based approximation for voice conversion in low resource environments

Article Open access 21 January 2016

A novel voice conversion approach using cascaded powerful cepstrum predictors with excitation and phase extracted from the target training space encoded as a KD-tree

Article 08 October 2019

A Multi-level GMM-Based Cross-Lingual Voice Conversion Using Language-Specific Mixture Weights for Polyglot Synthesis

Article 10 July 2015

References

Moulines, E., Sagisaka, Y.: Voice conversion: State of the art and perspectives. Speech Communication 16(2), 125–126 (1995)
Article Google Scholar
Arslan, L.M., Talkin, D.: Voice Conversion by Codebook Mapping of Line Spectral Frequencies and Excitation Spectrum. In: Proc. of the Eurospeech 1997, Rhodes, Greece (1997)
Google Scholar
Shuang, Z.-W., Wang, Z.-X., Ling, Z.-H., Wang, R.-H.: A novel voice conversion system based on codebook mapping with phoneme-tied weighting. In: Proc. ICSLP, Jeju (October 2004)
Google Scholar
Stylianou, Y., et al.: Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing 6(2), 131–142 (1998)
Article Google Scholar
Kain, A.B.: High Resolution Voice Transformation, Ph.D. thesis, Oregon Health and Science University (October 2001)
Google Scholar
Toda, T., Saruwatari, H., Shikano, K.: Voice conversion algorithm based on gaussian mixture model with dynamic frequency warping of straight spectrum. In: Proc. of ICASSP, pp. 841–944 (2001)
Google Scholar
Chen, Y., Chu, M., et al.: Voice conversion with smoothed gmm and map adaptation. In: Proc. Eurospeech, Geneva, Switzerland, September 2003, pp. 2413–2416 (2003)
Google Scholar
Valbret, H., et al.: Voice transformation using PSOLA technique. Speech Communication 11(2-3), 175–187 (1992)
Article Google Scholar
Narendranath, M., et al.: Transformation of formants for voice conversion using artificial neural networks. Speech Communication 16(2), 207–216 (1995)
Article Google Scholar
Watanabe, T., et al.: Transformation of Spectral Envelope for Voice Conversion Based on Radial Basis Function Networks. In: Proc. ICSLP 2002, Denver, USA, September 2002, pp. 285–288.
Google Scholar
Kim, E.K., et al.: Hidden Markov Model Based Voice Conversion Using Dynamic Characteristics of Speaker. In: Proc. Eurospeech, Rhodes, Greece, pp. 2519–2522 (1997)
Google Scholar
Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. J. Acoust. Soc. Jpn (E) 11(2), 71–76 (1990)
Google Scholar
Toda, T., Black, A.W., Tokuda, K.: pectral conversion based on maximum likelihood estimation considering global variance of converted parameter. In: Proc. Of ICASSP (2005)
Google Scholar
Klabbers, E., Veldhuis, R.: Reducing Audible Spectral Discontinuities. IEEE Transactions on Speech and Audio Processing 9(1), 39–51 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Science,
Yongguo Kang, Jianhua Tao & Bo Xu
China Research Lab, IBM,
Zhiwei Shuang & Wei Zhang

Authors

Yongguo Kang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Shuang
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Tao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences,
Jianhua Tao
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
MIT Media Laboratory, 20 Ames Street, 02139, Cambridge, MA, USA
Rosalind W. Picard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, Y., Shuang, Z., Tao, J., Zhang, W., Xu, B. (2005). A Hybrid GMM and Codebook Mapping Method for Spectral Conversion. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_39

Download citation

DOI: https://doi.org/10.1007/11573548_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics