Abstract
The application of speaker recognition technologies on domotic systems, cars, or mobile devices such as tablets, smartphones and smartwatches faces with the problem of ambient noise. This paper studies the robustness of a speaker identification system when the speech signal is corrupted by the environmental noise. In the everyday scenarios the noise sources are highly time-varying and potentially unknown. Therefore the noise robustness must be investigated in the absence of information about the noise. To this end the performance of speaker identification using short sequences of speech frames was evaluated using a database with simulated noisy speech data. This database is derived from the TIMIT database by rerecording the data in the presence of various noise types, and is used to test the model for speaker identification with a focus on the varieties of noise. Additionally, in order to optimize the recognition performance, in the training stage the white noise has been added as a first step towards the generation of multicondition training data to model speech corrupted by noise with unknown temporal-spectral characteristics. The experimental results demonstrated the validity of the proposed algorithm for speaker identification using short portions of speech also in realistic conditions when the ambient noise is not negligible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bhardwaj, S., Srivastava, S., Hanmandlu, M., Gupta, J.: GFM-based methods for speaker identification. IEEE Trans. Cybern. 43(3), 1047–1058 (2013)
Biagetti, G., Crippa, P., Curzi, A., Orcioni, S., Turchetti, C.: A multi-class ECG beat classifier based on the truncated KLT representation. In: UKSim-AMSS 8th European Modelling Symposium on Computer Modelling and Simulation (EMS 2014), pp. 93–98, October 2014
Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: An investigation on the accuracy of truncated DKLT representation for speaker identification with short sequences of speech frames. IEEE Trans. Cybern. (in press). doi:10.1109/TCYB.2016.2603146
Biagetti, G., Crippa, P., Curzi, A., Orcioni, S., Turchetti, C.: Speaker identification with short sequences of speech frames. In: 4th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015), Lisbon, Portugal, pp. 178–185. January 2015
Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: A rule based framework for smart training using sEMG signal. In: Neves-Silva, R., Jain, L.C., Howlett, R.J. (eds.) Intelligent Decision Technologies, Smart Innovation, Systems and Technologies, vol. 39, pp. 89–99. Springer, Cham (2015)
Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Distributed speech and speaker identification system for personalized domotic control. In: Conti, M., Martínez Madrid, N., Seepold, R., Orcioni, S. (eds.) Mobile Networks for Biometric Data Analysis, pp. 159–170. Springer, Cham (2016)
Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Multivariate direction scoring for dimensionality reduction in classification problems. In: Czarnowski, I., Caballero, A.M., Howlett, R.J., Jain, L.C. (eds.) Intelligent Decision Technologies 2016: Proceedings of the 8th KES International Conference on Intelligent Decision Technologies (KES-IDT 2016) - Part I, pp. 413–423. Springer, Cham (2016)
Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Robust speaker identification in a meeting with short audio segments. In: Czarnowski, I., Caballero, A.M., Howlett, R.J., Jain, L.C. (eds.) Intelligent Decision Technologies 2016: Proceedings of the 8th KES International Conference on Intelligent Decision Technologies (KES-IDT 2016) - Part II, pp. 465–477. Springer, Cham (2016)
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-García, J., Petrovska-Delacrétaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Sig. Process. 2004, 430–451 (2004)
Campbell, J.P.: J.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Crippa, P., Curzi, A., Falaschetti, L., Turchetti, C.: Multi-class ECG beat classification based on a Gaussian mixture model of Karhunen-Loève transform. Int. J. Simul. Syst. Sci. Technol. 16(1), 2.1–2.10 (2015)
Figueiredo, M.A.F., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical report N 93, 27403 (1993)
Gianfelici, F., Biagetti, G., Crippa, P., Turchetti, C.: AM-FM decomposition of speech signals: an asymptotically exact approach based on the iterated Hilbert transform. In: IEEE/SP 13th Workshop on Statistical Signal Processing 2005, pp. 333–338, July 2005
Gianfelici, F., Turchetti, C., Crippa, P.: A non-probabilistic recognizer of stochastic signals based on KLT. Sig. Process. 89(4), 422–437 (2009)
Gish, H., Schmidt, M.: Text-independent speaker identification. IEEE Sig. Process. Mag. 11(4), 18–32 (1994)
Jain, A., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000)
Jain, A., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans. Circ. Syst. Video Technol. 14(1), 4–20 (2004)
Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: From features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Maina, C., Walsh, J.: Joint speech enhancement and speaker identification using approximate Bayesian inference. IEEE Trans. Audio Speech Lang. Process. 19(6), 1517–1529 (2011)
McLaughlin, N., Ming, J., Crookes, D.: Speaker recognition in noisy conditions with limited training data. In: 2011 19th European Signal Processing Conference, pp. 1294–1298, August 2011
McLaughlin, N., Ming, J., Crookes, D.: Robust multimodal person identification with limited training data. IEEE Trans. Hum. Mach. Syst. 43(2), 214–224 (2013)
Ming, J., Hazen, T.J., Glass, J.R., Reynolds, D.A.: Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)
Patra, S., Acharya, S.: Dimension reduction of feature vectors using WPCA for robust speaker identification system. In: 2011 International Conference on Recent Trends in Information Technology (ICRTIT), pp. 28–32, June 2011
Reynolds, D.: An overview of automatic speaker recognition technology. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4, pp. IV–4072–IV–4075, May 2002
Sadjadi, S., Hansen, J.: Blind spectral weighting for robust speaker identification under reverberation mismatch. IEEE/ACM Trans. Audio Speech Lang. Process. 22(5), 937–945 (2014)
Togneri, R., Pullella, D.: An overview of speaker identification: Accuracy and robustness issues. IEEE Circ. Syst. Mag. 11(2), 23–61 (2011)
Turchetti, C., Biagetti, G., Gianfelici, F., Crippa, P.: Nonlinear system identification: an effective framework based on the Karhunen Loève transform. IEEE Trans. Signal Process. 57(2), 536–550 (2009)
Turchetti, C., Crippa, P., Pirani, M., Biagetti, G.: Representation of nonlinear random transformations by non-Gaussian stochastic neural networks. IEEE Trans. Neural Netw. 19(6), 1033–1060 (2008)
Zhao, X., Shao, Y., Wang, D.: CASA-based robust speaker identification. IEEE Trans. Audio Speech Lang. Process. 20(5), 1608–1616 (2012)
Zhao, X., Wang, Y., Wang, D.: Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 836–845 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C. (2018). Speaker Identification in Noisy Conditions Using Short Sequences of Speech Frames. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies 2017. IDT 2017. Smart Innovation, Systems and Technologies, vol 73. Springer, Cham. https://doi.org/10.1007/978-3-319-59424-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-59424-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59423-1
Online ISBN: 978-3-319-59424-8
eBook Packages: EngineeringEngineering (R0)