Deep feature learning for cover song identification | Multimedia Tools and Applications Skip to main content
Log in

Deep feature learning for cover song identification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The identification of a cover song, which is an alternative version of a previously recorded song, for music retrieval has received increasing attention. Methods for identifying a cover song typically involve comparing the similarity of chroma features between a query song and another song in the data set. However, considerable time is required for pairwise comparisons. In this study, chroma features were patched to preserve the melody. An intermediate representation was trained to reduce the dimension of each patch of chroma features. The training was performed using an autoencoder, commonly used in deep learning for dimensionality reduction. Experimental results showed that the proposed method achieved better accuracy for identification and spent less time for similarity matching in both covers80 dataset and Million Song Dataset as compared with traditional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Al-Shareef AJ, Mohamed EA, Al-Judaibi E (2008) One hour ahead load forecasting using artificial neural network for the western area of Saudi Arabia. Int J Elec Compu Eng 3(13):834–840

    Google Scholar 

  2. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learning 2(1):1–127

    Article  MATH  MathSciNet  Google Scholar 

  3. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  4. Bertin-Mahieux T, Ellis D (2012) Large-scale cover song recognition using the 2D-Fourier transform magnitude The 13th ISMIR Conference

  5. Bertin-Mahieux T, Ellis D, Whitman B, Lamere P. (2011) The million song dataset In Proceedings of ISMIR

  6. Chang TM, Hsieh CB, Chang PC (2014) An enhanced direct chord transformation for music retrieval in the AAC domain with window switching. Multimed Tools and Appl 74(18):7921–7942

    Article  Google Scholar 

  7. Ellis DPW (2006) Beat tracking with dynamic programming MIREX 2006 Audio Beat Tracking Contest system description

  8. Ellis DPW (2007) The “covers80” cover song data set. [Online]. Available: http://labrosa.ee.columbia.edu/projects/coversongs/covers80/

  9. Ellis D. Dynamic Time Warp (DTW) in Matlab. [Online]. Available: http://labrosa.ee.columbia.edu/matlab/dtw/

  10. Ellis DPW, and Cotton C (2006) The 2007 LABROSA cover song detection system. Music Information Retrieval Evaluation eXchange (MIREX) extended abstract

  11. Ellis DPW, Poliner GE (2007) Identifying cover songs with chroma features and dynamic programming beat tracking. IEEE Int. Conf. Acoustic, Speech and Signal Processing (ICASSP), Honolulu, HI, 1429 –1432

  12. Fujishima T (1999) Realtime chord recognition of musical sound: a system using common lisp music. Int. Comput. Music Conf., Beijing 464–467

  13. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MATH  MathSciNet  Google Scholar 

  14. Hinton GE, Salakhutdinov RS (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MATH  MathSciNet  Google Scholar 

  15. Humphrey EJ, Nieto O, Bello JP (2013) Data driven and discriminative projections for large-scale cover song identification. The 14th ISMIR Conference: 149–154

  16. Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386

    Article  Google Scholar 

  17. Lee K (2006) Identifying Cover Songs from Audio Using Harmonic Representation. Music Information Retrieval Evaluation eXchange (MIREX) extended abstract

  18. Nieto O, Bello JP (2014) Music segment similarity using 2D-Fourier magnitude coefficients. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP): 664–668

  19. Palm RB (2012) Deep learning toolbox, [Online]. Available: http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox

  20. Ranzato M, Boureau Y, LeCun Y (2007) Sparse feature learning for deep belief networks. Advances in Neural Information Processing Systems 20 (NIPS)

  21. Ranzato M, Poultney C, Chopra S, LeCun Y (2006) Efficient learning of sparse representations with an energy-based model NIPS

  22. Ravuri S, Ellis DPW (2010) Cover song detection: From high scores to general classification. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Dallas, Texas, U.S.A. 65–68

  23. Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011a) Contractive auto-encoders: Explicit invariance during feature extraction ICML

  24. Riley M, Heinen E, Ghosh J (2008) A text retrieval approach to content-based audio retrieval. Int. Conf. on Music Information Retrieval, Philadelphia, Pennsylvaia, U.S.A. 295–300

  25. Sailer C, Dressler K (2006) Finding cover songs by melodic similarity. Music Information Retrieval Evaluation eXchange (MIREX) extended abstract

  26. Salakhutdinov R (2009) Learning deep generative models doctoral dissertation. University of Toronto, Toronto

    Google Scholar 

  27. Salakhutdinov R Nonlinear dimensionality reduction using neural networks. Available: http://www.cs.toronto.edu/~rsalakhu/talks/NLDR_NIPS06workshop.pdf

  28. Serrà J, Gómez E (2008) Audio cover song identification based on tonal sequence alignment. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, Nevada, U.S.A. 61–64

  29. Serrà J, Gómez E, Herrera P, Serra X (2008) Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans Audio Speech Lang Process 16(6):1138–1151

    Article  Google Scholar 

  30. Serrà J, Gómez E, Herrera P (2010) Audio cover song identification and similarity: background, approaches, evaluation, and beyond. Adv Music Inf Retr 274(14):307–332

    Article  Google Scholar 

  31. Shepard RN (1982) Structural representations of musical pitch. In Deutsch, D, editor, The Psychology of Music, First Edition. Swets & Zeitlinger

  32. Signal processing toolbox, time-dependent frequency analysis (specgram). [Online]. Available: http://faculty.petra.ac.id/resmana/private/matlab-help/toolbox/signal/specgram.html

  33. Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, D. E. Rumelhart, J. L. McClelland, and C. PDP Research Group, Eds. Cambridge, MA, USA: MIT Press 194–281

  34. Tralie CJ, Bendich P (2015) Cover song identification with timbral shape sequences. arXiv preprint arXiv:1507.05143

  35. Vincent P, Larochelle H, Bengio Y, Manzagol, PA. (2008) Extracting and composing robust features with denoising autoencoders ICML

  36. Voorhees EM (1999) Proceedings of the 8th Text Retrieval Conference. TREC-8 question answering track report. 77–82

  37. Wang R, Han C, Wu Y, Guo T (2014) Fingerprint classification based on depth neural network. arXiv preprint arXiv:1409.5188

  38. Witmer R, Marks A (2006) In: Macy L (ed) Cover, grove music online. Oxford Univ. Press, Oxford

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pao-Chi Chang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, JT., Day, CT. & Chang, PC. Deep feature learning for cover song identification. Multimed Tools Appl 76, 23225–23238 (2017). https://doi.org/10.1007/s11042-016-4107-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4107-6

Keywords

Navigation