Abstract
Audio fingerprinting is an emerging research field in which a song must be recognized by matching an extracted “fingerprint” to a database of known fingerprints. Audio fingerprinting must solve the two key problems of representation and search. In this paper, we are given an 8192-bit binary representation of each five second interval of a song and therefore focus our attention on the problem of high-dimensional nearest neighbor search. High dimensional nearest neighbor search is known to suffer from the curse of dimensionality, i.e. as the dimension increases, the computational or memory costs increase exponentially. However, recently, there has been significant work on efficient, approximate, search algorithms. We build on this work and describe preliminary results of a probabilistic search algorithm. We describe the data structures and search algorithm used and then present experimental results for a database of 1,000 songs containing 12,217,111 fingerprints.
Similar content being viewed by others
References
E. Allamanche, J. Herre, O. Hellmuth, B. Bernhard Frobach, and M. Cremer, “AudioID: Towards Content-Based Identification of Audio Material,” in Proc. Int. Conf. on Web Delivering of Music, 2001.
S. Arya, D.M. Mount, N.S. Nctanyahu, R. Silverman, and A.Y. Wu, “An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions,” J. of the ACM, vol. 45, no. 6, 1998, pp. 891–923.
Y. Cheng, “Music Database Retrieval Based on Spectral Similarity,” in Int. Symp. on Music Information retrieval, 2001.
D. Fragoulis, G. Rousopoulos, T. Panagopoulos, C. Alexiou, and C. Papaodysscus, “On the Automated Recognition of Seriously Distorted Musical Recordings,” IEEE Trans. on Signal Processing, vol. 49, no. 4, 2001, pp. 898–908.
J. Haitsma and T. Kalker, “A Highly Robust Audio Fingerprinting System.” in 3rd Int. Conf. on Music Information Retrieval ISMIR, 2002.
J. Haitsma, T. Kalker, and J. Oostveen, “Robust Audio Hashing for Content Identification,” in Content Based Multimedia Indexing, 2001.
P. Indyk and R. Motwani, “Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality,” in Proc. of the 30th annual ACM Symp. on Theory of Computing, 1998, pp. 604–613.
J.M. Kleinberg, “Two Algorithms for Nearest-Neighbor Search in High Dimensions,” in Proc. of the 29th annual ACM Symp. on Theory of Computing, 1997, pp. 599–608.
H. Neuschmied, H. Mayer, and E. Battle, “Content-Based Indentification of Audio Titles on the Internet,” in Proc. Int. Conf. on Web Delivering of Music, 2001.
J. Oostveen, T. Kalker, and J. Haitsma, “Feature Extraction and a Database Strategy for Video Fingerprinting,” in 5th Int. Conf. on Visual Information Systems, vol. LNCS 2314, pp. 117–128. Springer Verlag, 2002.
P.N. Yianilos, “Locally Lifting the Curse of Dimensionality for nearest Neighbor Search,” in Proc. of the 11th annual ACM-SIAM Symp. on Discrete Algorithms, 2000, pp. 361–370.
Author information
Authors and Affiliations
Additional information
Ingemar J. Cox is currently Professor and Chair of Telecommunications in the Departments of Electronic Engineering and Computer Science at University College London and Director of UCL's Adastral Park Postgraduate Campus. He is currently a holder of a Royal Society Wolfson Fellowship. He received his B.Sc. from University College London and Ph.D. from Oxford University. He was a member of the Technical Staff at AT&T Bell Labs at Murray Hill from 1984 until 1989 where his research interests were focused on mobile robots. In 1989 he joined NEC Research Institute in Princeton, NJ as a senior research scientist in the computer science division. At NEC, his research shifted to problems in computer vision and he was responsible for creating the computer vision group at NECI. He has worked on problems to do with stereo and motion correspondence and multimedia issues of image database retrieval and watermarking. In 1999, he was awarded the IEEE Signal Processing Society Best Paper Award (Image and Multidimensional Signal Processing Area) for a paper he co-authored on watermarking. From 1997–1999, he served as Chief Technical Officer of Signafy, Inc, a subsiduary of NEC responsible for the commercialization of watermarking. Between 1996 and 1999, he led the design of NEC's watermarking proposal for DVD video disks and later colloborated with IBM in developing the technology behind the joint “Galaxy” proposal supported by Hitachi, IBM, NEC, Pioneer and Sony. In 1999, he returned to NEC Research Institute as a Research Fellow.
He is a senior member of the IEEE, a Fellow of the IEE and a Fellow of the Royal Society for Arts and Manufactures. He is co-editor in chief of the IEE Proc. on Information Security and an associate editor of the IEEE Trans. on Information Forensics and Security. He is co-author of a book entitled “Digital Watermarking” and the co-editor of two books, ‘Autonomous Robots Vehicles’ and ‘Partitioning Data Sets: With Applications to Psychology, Computer Vision and Target Tracking’.
This work was performed while the author was at NEC Research Institute, Princeton. © [2002] IEEE, Reprinted, with permission, from Miller, M.L.; Rodriguez, M.A.; Cox, I.J.; Multimedia Signal Processing, 2002 IEEE Workshop on, 2002 Page(s): 182–185.
Rights and permissions
About this article
Cite this article
Miller, M.L., Rodriguez, M.A. & Cox, I.J. Audio Fingerprinting: Nearest Neighbor Search in High Dimensional Binary Spaces. J VLSI Sign Process Syst Sign Image Video Technol 41, 285–291 (2005). https://doi.org/10.1007/s11265-005-4152-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-005-4152-2