Abstract
The identification of a cover song, which is an alternative version of a previously recorded song, for music retrieval has received increasing attention. Methods for identifying a cover song typically involve comparing the similarity of chroma features between a query song and another song in the data set. However, considerable time is required for pairwise comparisons. In this study, chroma features were patched to preserve the melody. An intermediate representation was trained to reduce the dimension of each patch of chroma features. The training was performed using an autoencoder, commonly used in deep learning for dimensionality reduction. Experimental results showed that the proposed method achieved better accuracy for identification and spent less time for similarity matching in both covers80 dataset and Million Song Dataset as compared with traditional approaches.
Similar content being viewed by others
References
Al-Shareef AJ, Mohamed EA, Al-Judaibi E (2008) One hour ahead load forecasting using artificial neural network for the western area of Saudi Arabia. Int J Elec Compu Eng 3(13):834–840
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learning 2(1):1–127
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Bertin-Mahieux T, Ellis D (2012) Large-scale cover song recognition using the 2D-Fourier transform magnitude The 13th ISMIR Conference
Bertin-Mahieux T, Ellis D, Whitman B, Lamere P. (2011) The million song dataset In Proceedings of ISMIR
Chang TM, Hsieh CB, Chang PC (2014) An enhanced direct chord transformation for music retrieval in the AAC domain with window switching. Multimed Tools and Appl 74(18):7921–7942
Ellis DPW (2006) Beat tracking with dynamic programming MIREX 2006 Audio Beat Tracking Contest system description
Ellis DPW (2007) The “covers80” cover song data set. [Online]. Available: http://labrosa.ee.columbia.edu/projects/coversongs/covers80/
Ellis D. Dynamic Time Warp (DTW) in Matlab. [Online]. Available: http://labrosa.ee.columbia.edu/matlab/dtw/
Ellis DPW, and Cotton C (2006) The 2007 LABROSA cover song detection system. Music Information Retrieval Evaluation eXchange (MIREX) extended abstract
Ellis DPW, Poliner GE (2007) Identifying cover songs with chroma features and dynamic programming beat tracking. IEEE Int. Conf. Acoustic, Speech and Signal Processing (ICASSP), Honolulu, HI, 1429 –1432
Fujishima T (1999) Realtime chord recognition of musical sound: a system using common lisp music. Int. Comput. Music Conf., Beijing 464–467
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Hinton GE, Salakhutdinov RS (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Humphrey EJ, Nieto O, Bello JP (2013) Data driven and discriminative projections for large-scale cover song identification. The 14th ISMIR Conference: 149–154
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386
Lee K (2006) Identifying Cover Songs from Audio Using Harmonic Representation. Music Information Retrieval Evaluation eXchange (MIREX) extended abstract
Nieto O, Bello JP (2014) Music segment similarity using 2D-Fourier magnitude coefficients. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP): 664–668
Palm RB (2012) Deep learning toolbox, [Online]. Available: http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
Ranzato M, Boureau Y, LeCun Y (2007) Sparse feature learning for deep belief networks. Advances in Neural Information Processing Systems 20 (NIPS)
Ranzato M, Poultney C, Chopra S, LeCun Y (2006) Efficient learning of sparse representations with an energy-based model NIPS
Ravuri S, Ellis DPW (2010) Cover song detection: From high scores to general classification. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Dallas, Texas, U.S.A. 65–68
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011a) Contractive auto-encoders: Explicit invariance during feature extraction ICML
Riley M, Heinen E, Ghosh J (2008) A text retrieval approach to content-based audio retrieval. Int. Conf. on Music Information Retrieval, Philadelphia, Pennsylvaia, U.S.A. 295–300
Sailer C, Dressler K (2006) Finding cover songs by melodic similarity. Music Information Retrieval Evaluation eXchange (MIREX) extended abstract
Salakhutdinov R (2009) Learning deep generative models doctoral dissertation. University of Toronto, Toronto
Salakhutdinov R Nonlinear dimensionality reduction using neural networks. Available: http://www.cs.toronto.edu/~rsalakhu/talks/NLDR_NIPS06workshop.pdf
Serrà J, Gómez E (2008) Audio cover song identification based on tonal sequence alignment. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, Nevada, U.S.A. 61–64
Serrà J, Gómez E, Herrera P, Serra X (2008) Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans Audio Speech Lang Process 16(6):1138–1151
Serrà J, Gómez E, Herrera P (2010) Audio cover song identification and similarity: background, approaches, evaluation, and beyond. Adv Music Inf Retr 274(14):307–332
Shepard RN (1982) Structural representations of musical pitch. In Deutsch, D, editor, The Psychology of Music, First Edition. Swets & Zeitlinger
Signal processing toolbox, time-dependent frequency analysis (specgram). [Online]. Available: http://faculty.petra.ac.id/resmana/private/matlab-help/toolbox/signal/specgram.html
Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, D. E. Rumelhart, J. L. McClelland, and C. PDP Research Group, Eds. Cambridge, MA, USA: MIT Press 194–281
Tralie CJ, Bendich P (2015) Cover song identification with timbral shape sequences. arXiv preprint arXiv:1507.05143
Vincent P, Larochelle H, Bengio Y, Manzagol, PA. (2008) Extracting and composing robust features with denoising autoencoders ICML
Voorhees EM (1999) Proceedings of the 8th Text Retrieval Conference. TREC-8 question answering track report. 77–82
Wang R, Han C, Wu Y, Guo T (2014) Fingerprint classification based on depth neural network. arXiv preprint arXiv:1409.5188
Witmer R, Marks A (2006) In: Macy L (ed) Cover, grove music online. Oxford Univ. Press, Oxford
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fang, JT., Day, CT. & Chang, PC. Deep feature learning for cover song identification. Multimed Tools Appl 76, 23225–23238 (2017). https://doi.org/10.1007/s11042-016-4107-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4107-6