{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T13:17:53Z","timestamp":1740143873776,"version":"3.37.3"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,2,10]],"date-time":"2020-02-10T00:00:00Z","timestamp":1581292800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,2,10]],"date-time":"2020-02-10T00:00:00Z","timestamp":1581292800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation in China","doi-asserted-by":"crossref","award":["No. 61571044"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation in China","doi-asserted-by":"crossref","award":["No. 11590772"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"published-print":{"date-parts":[[2020,12]]},"abstract":"Abstract<\/jats:title>Binaural sound source localization is an important and widely used perceptually based method and it has been applied to machine learning studies by many researchers based on head-related transfer function (HRTF). Because the HRTF is closely related to human physiological structure, the HRTFs vary between individuals. Related machine learning studies to date tend to focus on binaural localization in reverberant or noisy environments, or in conditions with multiple simultaneously active sound sources. In contrast, mismatched HRTF condition, in which the HRTFs used to generate the training and test sets are different, is rarely studied. This mismatch leads to a degradation of localization performance. A basic solution to this problem is to introduce more data to improve generalization performance, which requires a lot. However, simply increasing the data volume will result in data-inefficiency. In this paper, we propose a data-efficient method based on deep neural network (DNN) and clustering to improve binaural localization performance in the mismatched HRTF condition. Firstly, we analyze the relationship between binaural cues and the sound source localization with a classification DNN. Different HRTFs are used to generate training and test sets, respectively. On this basis, we study the localization performance of DNN model trained by each training set on different test sets. The result shows that the localization performance of the same model on different test sets is different, while the localization performance of different models on the same test set may be similar. The result also shows a clustering trend. Secondly, different HRTFs are divided into several clusters. Finally, the corresponding HRTFs of each cluster center are selected to generate a new training set and to train a more generalized DNN model. The experimental results show that the proposed method achieves better generalization performance than the baseline methods in the mismatched HRTF condition and has almost equal performance to the DNN trained with a large number of HRTFs, which means the proposed method is data-efficient.<\/jats:p>","DOI":"10.1186\/s13636-020-0171-y","type":"journal-article","created":{"date-parts":[[2020,2,10]],"date-time":"2020-02-10T17:03:17Z","timestamp":1581354197000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["Binaural sound localization based on deep neural network and affinity propagation clustering in mismatched HRTF condition"],"prefix":"10.1186","volume":"2020","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3653-9951","authenticated-orcid":false,"given":"Jing","family":"Wang","sequence":"first","affiliation":[]},{"given":"Jin","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Kai","family":"Qian","sequence":"additional","affiliation":[]},{"given":"Xiang","family":"Xie","sequence":"additional","affiliation":[]},{"given":"Jingming","family":"Kuang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,2,10]]},"reference":[{"issue":"8","key":"171_CR1","doi-asserted-by":"publisher","first-page":"1618","DOI":"10.1109\/TASLP.2017.2703650","volume":"25","author":"C. Pang","year":"2017","unstructured":"C. Pang, H. Liu, J. Zhang, X. Li, Binaural sound localization based on reverberation weighting and generalized parametric mapping. IEEE\/ACM Trans Audio Speech Lang Process. 25(8), 1618\u201332 (2017).","journal-title":"IEEE\/ACM Trans Audio Speech Lang Process"},{"issue":"4","key":"171_CR2","doi-asserted-by":"publisher","first-page":"320","DOI":"10.1109\/TASSP.1976.1162830","volume":"24","author":"C. Knapp","year":"1976","unstructured":"C. Knapp, G. Carter, The generalized correlation method for estimation of time delay. IEEE Trans Acoust Speech Signal Process. 24(4), 320\u2013327 (1976).","journal-title":"IEEE Trans Acoust Speech Signal Process"},{"issue":"4","key":"171_CR3","doi-asserted-by":"publisher","first-page":"922","DOI":"10.1121\/1.381623","volume":"62","author":"G. C. Carter","year":"1977","unstructured":"G. C. Carter, Variance bounds for passively locating an acoustic source with a symmetric line array. J Acoust Soc Am. 62(4), 922\u2013926 (1977).","journal-title":"J Acoust Soc Am"},{"issue":"3","key":"171_CR4","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1109\/TAP.1986.1143830","volume":"34","author":"R. Schmidt","year":"1986","unstructured":"R. Schmidt, Multiple emitter location and signal parameter estimation. IEEE Trans Antenn Propag. 34(3), 276\u2013280 (1986).","journal-title":"IEEE Trans Antenn Propag"},{"issue":"7","key":"171_CR5","doi-asserted-by":"publisher","first-page":"984","DOI":"10.1109\/29.32276","volume":"37","author":"R. Roy","year":"1989","unstructured":"R. Roy, T. Kailath, ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans Acoust Speech Signal Process. 37(7), 984\u2013995 (1989).","journal-title":"IEEE Trans Acoust Speech Signal Process"},{"issue":"1","key":"171_CR6","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1037\/h0061495","volume":"41","author":"L. A. Jeffress","year":"1948","unstructured":"L. A. Jeffress, A place theory of sound localization. J Comp Physiol Psychol. 41(1), 35 (1948).","journal-title":"J Comp Physiol Psychol"},{"key":"171_CR7","doi-asserted-by":"publisher","unstructured":"D. Li, S. E. Levinson, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP\u201903), vol. 5. A Bayes-rule based hierarchical system for binaural sound source localization (IEEE, 2003), p. 521. https:\/\/doi.org\/10.1109\/icassp.2003.1200021.","DOI":"10.1109\/icassp.2003.1200021"},{"issue":"5","key":"171_CR8","doi-asserted-by":"publisher","first-page":"982","DOI":"10.1109\/TSMCB.2006.872263","volume":"36","author":"V. Willert","year":"2006","unstructured":"V. Willert, J. Eggert, J. Adamy, R. Stahl, E. Korner, A probabilistic model for binaural sound localization. IEEE Trans Syst Man Cybernet Part B (Cybernet). 36(5), 982\u2013994 (2006).","journal-title":"IEEE Trans Syst Man Cybernet Part B (Cybernet)"},{"issue":"1","key":"171_CR9","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1109\/TASL.2009.2023644","volume":"18","author":"M. Raspaud","year":"2010","unstructured":"M. Raspaud, H. Viste, G. Evangelista, Binaural source localization by joint estimation of ILD and ITD. IEEE Trans Audio Speech Lang Process. 18(1), 68\u201377 (2010).","journal-title":"IEEE Trans Audio Speech Lang Process"},{"issue":"2","key":"171_CR10","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1109\/LSP.2011.2180376","volume":"19","author":"R. Parisi","year":"2012","unstructured":"R. Parisi, F. Camoes, M. Scarpiniti, A. Uncini, Cepstrum prefiltering for binaural source localization in reverberant environments. IEEE Signal Process Lett. 19(2), 99\u2013102 (2012).","journal-title":"IEEE Signal Process Lett"},{"key":"171_CR11","doi-asserted-by":"publisher","unstructured":"B. R. Hammond, P. J. Jackson, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Robust full-sphere binaural sound source localization (IEEE, 2018), pp. 86\u201390. https:\/\/doi.org\/10.1109\/icassp.2018.8462103.","DOI":"10.1109\/icassp.2018.8462103"},{"key":"171_CR12","doi-asserted-by":"publisher","unstructured":"F. Keyrouz, Y. Naous, K. Diepold, in 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 5. A new method for binaural 3-D localization based on HRTFs (IEEE, 2006). https:\/\/doi.org\/10.1109\/icassp.2006.1661282.","DOI":"10.1109\/icassp.2006.1661282"},{"key":"171_CR13","doi-asserted-by":"publisher","unstructured":"F. Keyrouz, K. Diepold, in 2006 IEEE International Symposium on Signal Processing and Information Technology. An enhanced binaural 3D sound localization algorithm (IEEE, 2006), pp. 662\u2013665. https:\/\/doi.org\/10.1109\/isspit.2006.270883.","DOI":"10.1109\/isspit.2006.270883"},{"key":"171_CR14","doi-asserted-by":"publisher","unstructured":"M. Usman, F. Keyrouz, K. Diepold, in 2008 9th International Conference on Signal Processing. Real time humanoid sound source localization and tracking in a highly reverberant environment (IEEE, 2008), pp. 2661\u20132664. https:\/\/doi.org\/10.1109\/icosp.2008.4697696.","DOI":"10.1109\/icosp.2008.4697696"},{"issue":"6","key":"171_CR15","doi-asserted-by":"publisher","first-page":"4290","DOI":"10.1121\/1.2909566","volume":"123","author":"J. A. MacDonald","year":"2008","unstructured":"J. A. MacDonald, A localization algorithm based on head-related transfer functions. J Acoust Soc Am. 123(6), 4290\u20134296 (2008).","journal-title":"J Acoust Soc Am"},{"issue":"1","key":"171_CR16","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1121\/1.4771972","volume":"133","author":"X. Wan","year":"2013","unstructured":"X. Wan, J. Liang, Robust and low complexity localization algorithm based on head-related impulse responses and interaural time difference. J Acoust Soc Am. 133(1), 40\u201346 (2013).","journal-title":"J Acoust Soc Am"},{"issue":"5","key":"171_CR17","doi-asserted-by":"publisher","first-page":"1503","DOI":"10.1109\/TASL.2012.2183869","volume":"20","author":"J. Woodruff","year":"2012","unstructured":"J. Woodruff, D. Wang, Binaural localization of multiple sources in reverberant and noisy environments. IEEE Trans Audio Speech Lang Process. 20(5), 1503\u20131512 (2012).","journal-title":"IEEE Trans Audio Speech Lang Process"},{"issue":"12","key":"171_CR18","doi-asserted-by":"publisher","first-page":"2444","DOI":"10.1109\/TASLP.2017.2750760","volume":"25","author":"N. Ma","year":"2017","unstructured":"N. Ma, T. May, G. J. Brown, Exploiting deep neural networks and head movements for robust binaural localization of multiple sources in reverberant environments. IEEE\/ACM Trans Audio Speech Lang Process (TASLP). 25(12), 2444\u20132453 (2017).","journal-title":"IEEE\/ACM Trans Audio Speech Lang Process (TASLP)"},{"key":"171_CR19","doi-asserted-by":"publisher","first-page":"40725","DOI":"10.1109\/ACCESS.2019.2905617","volume":"7","author":"C. Pang","year":"2019","unstructured":"C. Pang, H. Liu, X. Li, Multitask learning of time-frequency CNN for sound source localization. IEEE Access. 7:, 40725\u201340737 (2019).","journal-title":"IEEE Access"},{"key":"171_CR20","doi-asserted-by":"publisher","unstructured":"E. Thuillier, H. Gamper, I. J. Tashev, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Spatial audio feature discovery with convolutional neural networks (IEEE, 2018), pp. 6797\u20136801. https:\/\/doi.org\/10.1109\/icassp.2018.8462315.","DOI":"10.1109\/icassp.2018.8462315"},{"key":"171_CR21","doi-asserted-by":"publisher","unstructured":"P. Vecchiotti, N. Ma, S. Squartini, G. J. Brown, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). End-to-end binaural sound localisation from the raw waveform (IEEE, 2019), pp. 451\u2013455. https:\/\/doi.org\/10.1109\/icassp.2019.8683732.","DOI":"10.1109\/icassp.2019.8683732"},{"key":"171_CR22","doi-asserted-by":"publisher","unstructured":"V. R. Algazi, R. O. Duda, D. M. Thompson, C. Avendano, in Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575). The CIPIC HRTF database (IEEE, 2001), pp. 99\u2013102. https:\/\/doi.org\/10.1109\/aspaa.2001.969552.","DOI":"10.1109\/aspaa.2001.969552"},{"issue":"4","key":"171_CR23","doi-asserted-by":"publisher","first-page":"2236","DOI":"10.1121\/1.1610463","volume":"114","author":"N. Roman","year":"2003","unstructured":"N. Roman, D. Wang, G. J. Brown, Speech segregation based on sound localization. J Acoust Soc Am. 114(4), 2236\u20132252 (2003).","journal-title":"J Acoust Soc Am"},{"issue":"7","key":"171_CR24","doi-asserted-by":"publisher","first-page":"1830","DOI":"10.1109\/TSP.2004.828896","volume":"52","author":"O. Yilmaz","year":"2004","unstructured":"O. Yilmaz, S. Rickard, Blind separation of speech mixtures via time-frequency masking. IEEE Trans signal Process. 52(7), 1830\u20131847 (2004).","journal-title":"IEEE Trans signal Process"},{"key":"171_CR25","doi-asserted-by":"publisher","DOI":"10.1109\/9780470043387","volume-title":"Computational Auditory Scene Analysis: Principles, Algorithms, and Applications","author":"D. Wang","year":"2006","unstructured":"D. Wang, G. J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications (Wiley-IEEE Press, 445 Hoes Lane Piscataway, 2006)."},{"key":"171_CR26","doi-asserted-by":"publisher","unstructured":"G. E. Dahl, T. N. Sainath, G. E. Hinton, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Improving deep neural networks for LVCSR using rectified linear units and dropout (IEEE, 2013), pp. 8609\u20138613. https:\/\/doi.org\/10.1109\/icassp.2013.6639346.","DOI":"10.1109\/icassp.2013.6639346"},{"key":"171_CR27","unstructured":"G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint (2012). arXiv:1207.0580."},{"key":"171_CR28","unstructured":"D. P. Kingma, J. Ba, Adam: A method for stochastic optimization. arXiv preprint (2014). arXiv:1412.6980."},{"issue":"3","key":"171_CR29","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1250\/ast.35.159","volume":"35","author":"K. Watanabe","year":"2014","unstructured":"K. Watanabe, Y. Iwaya, Y. Suzuki, S. Takane, S. Sato, Dataset of head-related transfer functions measured with a circular loudspeaker array. Acoust Sci Technol. 35(3), 159\u2013165 (2014).","journal-title":"Acoust Sci Technol"},{"key":"171_CR30","doi-asserted-by":"publisher","unstructured":"J. S. Garofalo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, The DARPA TIMIT acoustic-phonetic continuous speech corpus cdrom. Linguistic Data Consortium (1993). https:\/\/doi.org\/10.6028\/nist.ir.4930.","DOI":"10.6028\/nist.ir.4930"}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-020-0171-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13636-020-0171-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-020-0171-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,9]],"date-time":"2021-02-09T00:36:47Z","timestamp":1612831007000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-020-0171-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,10]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["171"],"URL":"https:\/\/doi.org\/10.1186\/s13636-020-0171-y","relation":{},"ISSN":["1687-4722"],"issn-type":[{"type":"electronic","value":"1687-4722"}],"subject":[],"published":{"date-parts":[[2020,2,10]]},"assertion":[{"value":"5 June 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 January 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 February 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"4"}}