{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,3,13]],"date-time":"2023-03-13T04:30:13Z","timestamp":1678681813770},"reference-count":46,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2022,11,16]],"date-time":"2022-11-16T00:00:00Z","timestamp":1668556800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003130","name":"Fonds Wetenschappelijk Onderzoek","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003130","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004040","name":"KU Leuven","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004040","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100019180","name":"HORIZON EUROPE European Research Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100019180","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neuroinform."],"abstract":"Recent deep neural network based methods provide accurate binaural source localization performance. These data-driven models map measured binaural cues directly to source locations hence their performance highly depend on the training data distribution. In this paper, we propose a parametric embedding that maps the binaural cues to a low-dimensional space where localization can be done with a nearest-neighbor regression. We implement the embedding using a neural network, optimized to map points that are close to each other in the latent space (the space of source azimuths or elevations) to nearby points in the embedding space, thus the Euclidean distances between the embeddings reflect their source proximities, and the structure of the embeddings forms a manifold, which provides interpretability to the embeddings. We show that the proposed embedding generalizes well in various acoustic conditions (with reverberation) different from those encountered during training, and provides better performance than unsupervised embeddings previously used for binaural localization. In addition, the proposed method performs better than or equally well as a feed-forward neural network based model that directly estimates the source locations from the binaural cues, and it has better results than the feed-forward model when a small amount of training data is used. Moreover, we also compare the proposed embedding using both supervised and weakly supervised learning, and show that in both conditions, the resulting embeddings perform similarly well, but the weakly supervised embedding allows to estimate source azimuth and elevation simultaneously.<\/jats:p>","DOI":"10.3389\/fninf.2022.942978","type":"journal-article","created":{"date-parts":[[2022,11,16]],"date-time":"2022-11-16T06:33:49Z","timestamp":1668580429000},"update-policy":"http:\/\/dx.doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Toward learning robust contrastive embeddings for binaural sound source localization"],"prefix":"10.3389","volume":"16","author":[{"given":"Duowei","family":"Tang","sequence":"first","affiliation":[]},{"given":"Maja","family":"Taseska","sequence":"additional","affiliation":[]},{"given":"Toon","family":"van Waterschoot","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2022,11,16]]},"reference":[{"key":"B1","first-page":"99","article-title":"\u201cThe CIPIC HRTF database,\u201d","volume-title":"Proceedings of IEEE Applications of Signal Processing to Audio Acoustics (WASPAA 2001)","author":"Algazi","year":"2001"},{"key":"B2","doi-asserted-by":"publisher","first-page":"943","DOI":"10.1121\/1.382599","article-title":"Image method for efficiently simulating small-room acoustics prediction of energy decay in room impulse responses simulated with an image-source model image method for efficiently simulating small-room acoustics","volume":"65","author":"Allen","year":"1979","journal-title":"J. Acoust. Soc. Am"},{"key":"B3","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1016\/j.csl.2015.03.003","article-title":"A survey on sound source localization in robotics: from binaural to array processing methods","volume":"34","author":"Argentieri","year":"2015","journal-title":"Comput. Speech Lang"},{"key":"B4","doi-asserted-by":"publisher","first-page":"1373","DOI":"10.1162\/089976603321780317","article-title":"Laplacian eigenmaps for dimensionality reduction and data representation","volume":"6","author":"Belkin","year":"2003","journal-title":"Neural Comput."},{"key":"B5","first-page":"177","article-title":"\u201cOut-of-sample extensions for LLE, Isomap, MDS, Eigenmaps and spectral clustering,\u201d","volume-title":"Proceedings of IEEE Conference on Advances in Neural Information Processing Systems (NeurIPS 2003).","author":"Bengio","year":"2003"},{"key":"B6","volume-title":"Spatial Hearing: The Psychophysics of Human Sound Localization","author":"Blauert","year":"1997"},{"key":"B7","first-page":"737","article-title":"\u201cSignature verification using a \u201csiamese\u201d time delay neural network,\u201d","volume-title":"Advances in Neural Information Processing Systems, Vol. 6","author":"Bromley","year":"1993"},{"key":"B8","first-page":"539","article-title":"\u201cLearning a similarity metric discriminatively, with application to face verification,\u201d","volume-title":"Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005)","author":"Chopra","year":"2005"},{"key":"B9","volume-title":"Spectral Graph Theory","author":"Chung","year":"1997"},{"key":"B10","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1121\/1.415854","article-title":"An artificial neural network for sound localization using binaural cues","volume":"100","author":"Datum","year":"1996","journal-title":"J. Acoust. Soc. Am"},{"key":"B11","doi-asserted-by":"publisher","first-page":"1440003","DOI":"10.1142\/S0129065714400036","article-title":"Acoustic space learning for sound source separation and localization on binaural manifolds","volume":"25","author":"Deleforge","year":"2015","journal-title":"Int. J. Neural Syst"},{"key":"B12","doi-asserted-by":"crossref","DOI":"10.1109\/MLSP.2012.6349784","article-title":"\u201c2D sound-source localization on the binaural manifold,\u201d","volume-title":"2012 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2012)","author":"Deleforge","year":"2012"},{"key":"B13","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1109\/SAM.2018.8448967","article-title":"\u201cSound source localization for hearing aid applications using wireless microphones,\u201d","volume-title":"IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM 2018)","author":"Farmani","year":"2018"},{"key":"B14","doi-asserted-by":"publisher","first-page":"3907","DOI":"10.1121\/1.412407","article-title":"HRTF measurements of a KEMAR","volume":"97","author":"Gardner","year":"1995","journal-title":"J. Acoust. Soc. Am"},{"key":"B15","volume-title":"TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1","author":"Garofolo","year":"1993"},{"key":"B16","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1007\/978-3-319-53547-0_7","article-title":"\u201cVAST: the virtual acoustic space traveler dataset,\u201d","volume-title":"Proceedings of International Conference on Latent Variable Analysis and Signal Separation (LVA\/ICA)","author":"Gaultier","year":"2017"},{"key":"B17","doi-asserted-by":"publisher","first-page":"113","DOI":"10.5152\/iao.2017.2820","article-title":"Efficacy of directional microphones in hearing aids equipped with wireless synchronization technology","volume":"13","author":"Geetha","year":"2017","journal-title":"J. Int. Adv. Otol"},{"key":"B18","first-page":"1735","article-title":"\u201cDimensionality reduction by learning an invariant mapping,\u201d","volume-title":"Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006)","author":"Hadsell","year":"2006"},{"key":"B19","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1503.02531","article-title":"Distilling the knowledge in a neural network","author":"Hinton","year":"2015","journal-title":"arXiv preprint arXiv:1503.02531"},{"key":"B20","doi-asserted-by":"publisher","first-page":"e1","DOI":"10.4081\/audiores.2013.e1","article-title":"Evaluation of speech intelligibility and sound localization abilities with hearing aids using binaural wireless technology","volume":"3","author":"Ibrahim","year":"2013","journal-title":"Audiol. Res"},{"key":"B21","first-page":"448","article-title":"\u201cBatch normalization: accelerating deep network training by reducing internal covariate shift,\u201d","volume-title":"Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37","author":"Ioffe","year":"2015"},{"key":"B22","doi-asserted-by":"crossref","first-page":"5164","DOI":"10.1109\/ICASSP.2018.8462586","article-title":"\u201cBinaural speech source localization using template matching of interaural time difference patterns,\u201d","volume-title":"2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '18)","author":"Karthik","year":"2018"},{"key":"B23","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1162\/pres.16.5.509","article-title":"Binaural source localization and spatial audio reproduction for telepresence applications","volume":"16","author":"Keyrouz","year":"2007","journal-title":"Presence Teleoper. Virt. Environ"},{"key":"B24","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1412.6980","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2015","journal-title":"arXiv preprint arXiv:1412.6980"},{"key":"B25","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1177\/1084713810364396","article-title":"Improvements in speech understanding with wireless binaural broadband digital hearing instruments in adults with sensorineural hearing loss","volume":"14","author":"Kreisman","year":"2010","journal-title":"Trends Amplif"},{"key":"B26","first-page":"1","article-title":"\u201cRelative transfer function modeling for supervised source localization,\u201d","volume-title":"Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2013)","author":"Laufer","year":"2013"},{"key":"B27","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1007\/978-3-319-22482-4_23","article-title":"\u201cA study on manifolds of acoustic responses,\u201d","volume-title":"Proceedings of the International Conference on Latent Variable Analysis and Signal Separation","author":"Laufer-Goldshtein","year":"2015"},{"key":"B28","doi-asserted-by":"publisher","first-page":"2171","DOI":"10.1109\/TASLP.2016.2598319","article-title":"Estimation of the direct-path relative transfer function for supervised sound-source localization","volume":"24","author":"Li","year":"2016","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process"},{"key":"B29","doi-asserted-by":"publisher","first-page":"2122","DOI":"10.1109\/TASLP.2018.2855960","article-title":"Robust binaural localization of a target sound source by combining spectral source models and deep neural networks","volume":"26","author":"Ma","year":"2018","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process"},{"key":"B30","doi-asserted-by":"publisher","first-page":"2444","DOI":"10.1109\/TASLP.2017.2750760","article-title":"Exploiting deep neural networks and head movements for robust binaural localization of multiple sources in reverberant environments","volume":"25","author":"Ma","year":"2017","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process"},{"key":"B31","doi-asserted-by":"publisher","first-page":"382","DOI":"10.1109\/TASL.2009.2029711","article-title":"Model-based expectation-maximization source separation and localization","volume":"18","author":"Mandel","year":"2010","journal-title":"IEEE Trans. Audio Speech Lang. Process"},{"key":"B32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TASL.2010.2042128","article-title":"A probabilistic model for robust localization based on a binaural auditory front-end","volume":"19","author":"May","year":"2011","journal-title":"IEEE Trans. Audio Speech Lang. Process"},{"key":"B33","first-page":"283","article-title":"\u201cDeep ranking-based sound source localization,\u201d","volume-title":"Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2019)","author":"Opochinsky","year":"2019"},{"key":"B34","doi-asserted-by":"publisher","first-page":"1335","DOI":"10.1109\/TASLP.2019.2919378","article-title":"Sound localization based on phase difference enhancement using deep neural networks","volume":"27","author":"Pak","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process"},{"key":"B35","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1109\/TASL.2009.2023644","article-title":"Binaural source localization by joint estimation of ILD and ITD","volume":"18","author":"Raspaud","year":"2010","journal-title":"IEEE Trans. Audio Speech Lang. Process"},{"key":"B36","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1016\/j.anorl.2018.04.009","article-title":"Sound source localization","volume":"135","author":"Risoud","year":"2018","journal-title":"Eur. Ann. Otorhinolaryngol. Head Neck Dis"},{"key":"B37","first-page":"241","article-title":"\u201cA fast and accurate shoebox room acoustics simulator,\u201d","volume-title":"Procedings of 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '09)","author":"Schimmel","year":"2009"},{"key":"B38","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"B39","doi-asserted-by":"crossref","first-page":"1701","DOI":"10.1109\/CVPR.2014.220","article-title":"\u201cDeepFace: closing the gap to human-level performance in face verification,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014)","author":"Taigman","year":"2014"},{"key":"B40","first-page":"358","article-title":"\u201cSupervised contrastive embeddings for binaural source localization,\u201d","volume-title":"Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2019)","author":"Tang","year":"2019"},{"key":"B41","first-page":"1","article-title":"\u201cOn spectral embeddings for supervised binaural source localization,\u201d","volume-title":"Proceedings of the 27th European Signal Processing Conference (EUSIPCO '27)","author":"Taseska","year":"2019"},{"key":"B42","first-page":"451","article-title":"\u201cEnd-to-end binaural sound localisation from the raw waveform,\u201d","volume-title":"Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '19)","author":"Vecchiotti","year":"2019"},{"key":"B43","doi-asserted-by":"publisher","first-page":"566","DOI":"10.1002\/tee.22007","article-title":"Novel design for non-latency wireless binaural hearing aids","volume":"9","author":"Wei","year":"2014","journal-title":"IEEE Trans. Electr. Electron. Eng"},{"key":"B44","doi-asserted-by":"publisher","first-page":"1503","DOI":"10.1109\/TASL.2012.2183869","article-title":"Binaural localization of multiple sources in reverberant and noisy environments","volume":"20","author":"Woodruff","year":"2012","journal-title":"IEEE Trans. Audio Speech Lang. Process"},{"key":"B45","doi-asserted-by":"publisher","first-page":"37","DOI":"10.20965\/jrm.2017.p0037","article-title":"Sound source localization using deep learning models","volume":"29","author":"Yalta","year":"2017","journal-title":"J. Robot. Mechatron"},{"key":"B46","first-page":"825","article-title":"\u201cSupervised direct-path relative transfer function learning for binaural sound source localization,\u201d","volume-title":"Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '21)","author":"Yang","year":"2021"}],"container-title":["Frontiers in Neuroinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fninf.2022.942978\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,12]],"date-time":"2023-03-12T06:14:31Z","timestamp":1678601671000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fninf.2022.942978\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,16]]},"references-count":46,"alternative-id":["10.3389\/fninf.2022.942978"],"URL":"https:\/\/doi.org\/10.3389\/fninf.2022.942978","relation":{},"ISSN":["1662-5196"],"issn-type":[{"value":"1662-5196","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,16]]}}}