Speaker Identification in Noisy Conditions Using Short Sequences of Speech Frames | SpringerLink
Skip to main content

Speaker Identification in Noisy Conditions Using Short Sequences of Speech Frames

  • Conference paper
  • First Online:
Intelligent Decision Technologies 2017 (IDT 2017)

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 73))

Included in the following conference series:

Abstract

The application of speaker recognition technologies on domotic systems, cars, or mobile devices such as tablets, smartphones and smartwatches faces with the problem of ambient noise. This paper studies the robustness of a speaker identification system when the speech signal is corrupted by the environmental noise. In the everyday scenarios the noise sources are highly time-varying and potentially unknown. Therefore the noise robustness must be investigated in the absence of information about the noise. To this end the performance of speaker identification using short sequences of speech frames was evaluated using a database with simulated noisy speech data. This database is derived from the TIMIT database by rerecording the data in the presence of various noise types, and is used to test the model for speaker identification with a focus on the varieties of noise. Additionally, in order to optimize the recognition performance, in the training stage the white noise has been added as a first step towards the generation of multicondition training data to model speech corrupted by noise with unknown temporal-spectral characteristics. The experimental results demonstrated the validity of the proposed algorithm for speaker identification using short portions of speech also in realistic conditions when the ambient noise is not negligible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
JPY 21449
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bhardwaj, S., Srivastava, S., Hanmandlu, M., Gupta, J.: GFM-based methods for speaker identification. IEEE Trans. Cybern. 43(3), 1047–1058 (2013)

    Article  Google Scholar 

  2. Biagetti, G., Crippa, P., Curzi, A., Orcioni, S., Turchetti, C.: A multi-class ECG beat classifier based on the truncated KLT representation. In: UKSim-AMSS 8th European Modelling Symposium on Computer Modelling and Simulation (EMS 2014), pp. 93–98, October 2014

    Google Scholar 

  3. Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: An investigation on the accuracy of truncated DKLT representation for speaker identification with short sequences of speech frames. IEEE Trans. Cybern. (in press). doi:10.1109/TCYB.2016.2603146

  4. Biagetti, G., Crippa, P., Curzi, A., Orcioni, S., Turchetti, C.: Speaker identification with short sequences of speech frames. In: 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015), Lisbon, Portugal, pp. 178–185. January 2015

    Google Scholar 

  5. Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: A rule based framework for smart training using sEMG signal. In: Neves-Silva, R., Jain, L.C., Howlett, R.J. (eds.) Intelligent Decision Technologies, Smart Innovation, Systems and Technologies, vol. 39, pp. 89–99. Springer, Cham (2015)

    Google Scholar 

  6. Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Distributed speech and speaker identification system for personalized domotic control. In: Conti, M., Martínez Madrid, N., Seepold, R., Orcioni, S. (eds.) Mobile Networks for Biometric Data Analysis, pp. 159–170. Springer, Cham (2016)

    Chapter  Google Scholar 

  7. Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Multivariate direction scoring for dimensionality reduction in classification problems. In: Czarnowski, I., Caballero, A.M., Howlett, R.J., Jain, L.C. (eds.) Intelligent Decision Technologies 2016: Proceedings of the 8th KES International Conference on Intelligent Decision Technologies (KES-IDT 2016) - Part I, pp. 413–423. Springer, Cham (2016)

    Google Scholar 

  8. Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Robust speaker identification in a meeting with short audio segments. In: Czarnowski, I., Caballero, A.M., Howlett, R.J., Jain, L.C. (eds.) Intelligent Decision Technologies 2016: Proceedings of the 8th KES International Conference on Intelligent Decision Technologies (KES-IDT 2016) - Part II, pp. 465–477. Springer, Cham (2016)

    Chapter  Google Scholar 

  9. Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-García, J., Petrovska-Delacrétaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Sig. Process. 2004, 430–451 (2004)

    Article  Google Scholar 

  10. Campbell, J.P.: J.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)

    Article  Google Scholar 

  11. Crippa, P., Curzi, A., Falaschetti, L., Turchetti, C.: Multi-class ECG beat classification based on a Gaussian mixture model of Karhunen-Loève transform. Int. J. Simul. Syst. Sci. Technol. 16(1), 2.1–2.10 (2015)

    Google Scholar 

  12. Figueiredo, M.A.F., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)

    Article  Google Scholar 

  13. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical report N 93, 27403 (1993)

    Google Scholar 

  14. Gianfelici, F., Biagetti, G., Crippa, P., Turchetti, C.: AM-FM decomposition of speech signals: an asymptotically exact approach based on the iterated Hilbert transform. In: IEEE/SP 13th Workshop on Statistical Signal Processing 2005, pp. 333–338, July 2005

    Google Scholar 

  15. Gianfelici, F., Turchetti, C., Crippa, P.: A non-probabilistic recognizer of stochastic signals based on KLT. Sig. Process. 89(4), 422–437 (2009)

    Article  MATH  Google Scholar 

  16. Gish, H., Schmidt, M.: Text-independent speaker identification. IEEE Sig. Process. Mag. 11(4), 18–32 (1994)

    Article  Google Scholar 

  17. Jain, A., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000)

    Article  Google Scholar 

  18. Jain, A., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans. Circ. Syst. Video Technol. 14(1), 4–20 (2004)

    Article  Google Scholar 

  19. Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: From features to supervectors. Speech Commun. 52(1), 12–40 (2010)

    Article  Google Scholar 

  20. Maina, C., Walsh, J.: Joint speech enhancement and speaker identification using approximate Bayesian inference. IEEE Trans. Audio Speech Lang. Process. 19(6), 1517–1529 (2011)

    Article  Google Scholar 

  21. McLaughlin, N., Ming, J., Crookes, D.: Speaker recognition in noisy conditions with limited training data. In: 2011 19th European Signal Processing Conference, pp. 1294–1298, August 2011

    Google Scholar 

  22. McLaughlin, N., Ming, J., Crookes, D.: Robust multimodal person identification with limited training data. IEEE Trans. Hum. Mach. Syst. 43(2), 214–224 (2013)

    Article  Google Scholar 

  23. Ming, J., Hazen, T.J., Glass, J.R., Reynolds, D.A.: Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)

    Article  Google Scholar 

  24. Patra, S., Acharya, S.: Dimension reduction of feature vectors using WPCA for robust speaker identification system. In: 2011 International Conference on Recent Trends in Information Technology (ICRTIT), pp. 28–32, June 2011

    Google Scholar 

  25. Reynolds, D.: An overview of automatic speaker recognition technology. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4, pp. IV–4072–IV–4075, May 2002

    Google Scholar 

  26. Sadjadi, S., Hansen, J.: Blind spectral weighting for robust speaker identification under reverberation mismatch. IEEE/ACM Trans. Audio Speech Lang. Process. 22(5), 937–945 (2014)

    Article  Google Scholar 

  27. Togneri, R., Pullella, D.: An overview of speaker identification: Accuracy and robustness issues. IEEE Circ. Syst. Mag. 11(2), 23–61 (2011)

    Article  Google Scholar 

  28. Turchetti, C., Biagetti, G., Gianfelici, F., Crippa, P.: Nonlinear system identification: an effective framework based on the Karhunen Loève transform. IEEE Trans. Signal Process. 57(2), 536–550 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  29. Turchetti, C., Crippa, P., Pirani, M., Biagetti, G.: Representation of nonlinear random transformations by non-Gaussian stochastic neural networks. IEEE Trans. Neural Netw. 19(6), 1033–1060 (2008)

    Article  Google Scholar 

  30. Zhao, X., Shao, Y., Wang, D.: CASA-based robust speaker identification. IEEE Trans. Audio Speech Lang. Process. 20(5), 1608–1616 (2012)

    Article  Google Scholar 

  31. Zhao, X., Wang, Y., Wang, D.: Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 836–845 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Crippa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C. (2018). Speaker Identification in Noisy Conditions Using Short Sequences of Speech Frames. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies 2017. IDT 2017. Smart Innovation, Systems and Technologies, vol 73. Springer, Cham. https://doi.org/10.1007/978-3-319-59424-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59424-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59423-1

  • Online ISBN: 978-3-319-59424-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics