Towards Cross-Lingual Emotion Transplantation

Lorenzo-Trueba, Jaime; Barra-Chicote, Roberto; Yamagishi, Junichi; Montero, Juan M.

doi:10.1007/978-3-319-13623-3_21

Jaime Lorenzo-Trueba²³,
Roberto Barra-Chicote²³,
Junichi Yamagishi²⁴ &
…
Juan M. Montero²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8854))

828 Accesses
2 Citations

Abstract

In this paper we introduce the idea of cross-lingual emotion transplantation. The aim is to lean the nuances of emotional speech in a source language for which we have enough data to adapt an acceptable quality emotional model by means of CSMAPLR adaptation, and then convert the adaptation function so it can be applied to a target language in a different target speaker while maintaining the speaker identity but adding emotional information. The conversion between languages is done at state level by measuring the KLD distance between the Gaussian distributions of all the states and linking the closest ones. Finally, as the cross-lingual transplantation of spectral emotions (mainly anger) was found out to introduce significant amounts of spectral noise, we show the results of applying three different techniques related to adaptation parameters that can be used to reduce the noise. The results are measured in an objective fashion by means of a bi-dimensional PCA projection of the KLD distances between the considered models (neutral models of both languages, reference emotion for both languages and transplanted emotional model for the target language).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Prosodic transformation in vocal emotion conversion for multi-lingual scenarios: a pilot study

Article 03 September 2019

Speech Synthesizing Simultaneous Emotion-Related States

Bilingual Speech Emotion Recognition Using Neural Networks: A Case Study for Turkish and English Languages

References

Barra-Chicote, R., Montero, J.M., Macias-Guarasa, J., Lufti, S., Lucas, J.M., Fernandez, F., D’haro, L.F., San-Segundo, R., Ferreiros, J., Cordoba, R., Pardo, J.M.: Spanish expressive voices: Corpus for emotion research in spanish. In: Proc. of LREC (2008)
Google Scholar
Barra-Chicote, R.: Contributions to the analysis, design and evaluation of strategies for corpus-based emotional speech synthesis. Ph.D. thesis, ETSIT-UPM (2011)
Google Scholar
Gales, M.J.: Cluster adaptive training of hidden markov models. IEEE Transactions on Speech and Audio Processing 8(4), 417–428 (2000)
Article Google Scholar
Liang, H., Dines, J.: Phonological knowledge guided hmm state mapping for cross-lingual speaker adaptation. In: INTERSPEECH, pp. 1825–1828 (2011)
Google Scholar
Lorenzo-Trueba, J., Barra-Chicote, R., Yamagishi, J., Watts, O., Montero, J.M.: Towards speaking style transplantation in speech synthesis. In: 8th ISCA Speech Synthesis Workshop (2013)
Google Scholar
Nose, T., Kato, Y., Kobayashi, T.: Style estimation of speech based on multiple regression hidden semi-markov model. In: INTERSPEECH, pp. 2285–2288 (2007)
Google Scholar
Oura, K., Yamagishi, J., Wester, M., King, S., Tokuda, K.: Analysis of unsupervised cross-lingual speaker adaptation for hmm-based speech synthesis using kld-based transform mapping. Speech Communication 54(6), 703–714 (2012)
Article Google Scholar
Qian, Y., Xu, J., Soong, F.K.: A frame mapping based hmm approach to cross-lingual voice transformation. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5120–5123. IEEE (2011)
Google Scholar
Shichiri, K., Sawabe, A., Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Eigenvoices for hmm-based speech synthesis. In: INTERSPEECH (2002)
Google Scholar
Takeda, S., Kabuta, Y., Inoue, T., Hatoko, M.: Proposal of a japanese-speech-synthesis method with dimensional representation of emotions based on prosody as well as voice-quality conversion. International Journal of Affective Engineering 12(2), 79–88 (2013)
Article Google Scholar
Togneri, R., Pullella, D.: An overview of speaker identification: Accuracy and robustness issues. IEEE Circuits and Systems Magazine 11(2), 23–61 (2011)
Article Google Scholar
Toman, M., Pucher, M., Schabus, D.: Multi-variety adaptive acoustic modeling in hsmm-based speech synthesis. In: 8th ISCA Speech Synthesis Workshop (2013)
Google Scholar
Toman, M.E., Pucher, M.: Structural kld for cross-variety speaker adaptation in hmm-based speech synthesis. In: Proc. SPPRA, Innsbruck, Austria (2013)
Google Scholar
Wu, Y.J., Nankaku, Y., Tokuda, K.: State mapping based method for cross-lingual speaker adaptation in hmm-based speech synthesis. In: INTERSPEECH, pp. 528–531 (2009)
Google Scholar
Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., Isogai, J.: Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm. IEEE Transactions on Audio, Speech, and Language Processing 17(1), 66–83 (2009)
Article Google Scholar
Yoshimura, T., Hashimoto, K., Oura, K., Nankaku, Y., Tokuda, K.: Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for hmm-based speech synthesis. In: 8th ISCA Speech Synthesis Workshop (2013)
Google Scholar
Zen, H., Braunschweiler, N., Buchholz, S., Knill, K., Krstulovic, S., Latorre, J.: Hmm-based polyglot speech synthesis by speaker and language adaptive training. In: Seventh ISCA Workshop on Speech Synthesis (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology Group, ETSI Telecomunicacion, Universidad Politecnica de Madrid, Spain
Jaime Lorenzo-Trueba, Roberto Barra-Chicote & Juan M. Montero
National Institute of Informatics, Tokyo, Japan
Junichi Yamagishi

Authors

Jaime Lorenzo-Trueba
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Barra-Chicote
View author publications
You can also search for this author in PubMed Google Scholar
Junichi Yamagishi
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Montero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ETSIT, Las Palmas de Gran Canaria, Spain
Juan Luis Navarro Mesa , Eduardo Hernández Pérez , Pedro Quintana Morales , Antonio Ravelo García & Iván Guerra Moreno , , , &
University of Zaragoza, Spain
Alfonso Ortega
Dep. of Electronics, Telecommunications and Informatics Engineering, University of Aveiro, Portugal
António Teixeira
ATVS Biometric Recognition Group,, Universidad Autónoma de Madrid, Spain
Doroteo T. Toledano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lorenzo-Trueba, J., Barra-Chicote, R., Yamagishi, J., Montero, J.M. (2014). Towards Cross-Lingual Emotion Transplantation. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-13623-3_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13622-6
Online ISBN: 978-3-319-13623-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Cross-Lingual Emotion Transplantation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Prosodic transformation in vocal emotion conversion for multi-lingual scenarios: a pilot study

Speech Synthesizing Simultaneous Emotion-Related States

Bilingual Speech Emotion Recognition Using Neural Networks: A Case Study for Turkish and English Languages

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards Cross-Lingual Emotion Transplantation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Prosodic transformation in vocal emotion conversion for multi-lingual scenarios: a pilot study

Speech Synthesizing Simultaneous Emotion-Related States

Bilingual Speech Emotion Recognition Using Neural Networks: A Case Study for Turkish and English Languages

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation