Deep feature learning for cover song identification

Fang, Jiunn-Tsair; Day, Chi-Ting; Chang, Pao-Chi

doi:10.1007/s11042-016-4107-6

Deep feature learning for cover song identification

Published: 13 November 2016

Volume 76, pages 23225–23238, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jiunn-Tsair Fang¹,
Chi-Ting Day² &
Pao-Chi Chang²

603 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

The identification of a cover song, which is an alternative version of a previously recorded song, for music retrieval has received increasing attention. Methods for identifying a cover song typically involve comparing the similarity of chroma features between a query song and another song in the data set. However, considerable time is required for pairwise comparisons. In this study, chroma features were patched to preserve the melody. An intermediate representation was trained to reduce the dimension of each patch of chroma features. The training was performed using an autoencoder, commonly used in deep learning for dimensionality reduction. Experimental results showed that the proposed method achieved better accuracy for identification and spent less time for similarity matching in both covers80 dataset and Million Song Dataset as compared with traditional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

References

Al-Shareef AJ, Mohamed EA, Al-Judaibi E (2008) One hour ahead load forecasting using artificial neural network for the western area of Saudi Arabia. Int J Elec Compu Eng 3(13):834–840
Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learning 2(1):1–127
Article MATH MathSciNet Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Bertin-Mahieux T, Ellis D (2012) Large-scale cover song recognition using the 2D-Fourier transform magnitude The 13th ISMIR Conference
Bertin-Mahieux T, Ellis D, Whitman B, Lamere P. (2011) The million song dataset In Proceedings of ISMIR
Chang TM, Hsieh CB, Chang PC (2014) An enhanced direct chord transformation for music retrieval in the AAC domain with window switching. Multimed Tools and Appl 74(18):7921–7942
Article Google Scholar
Ellis DPW (2006) Beat tracking with dynamic programming MIREX 2006 Audio Beat Tracking Contest system description
Ellis DPW (2007) The “covers80” cover song data set. [Online]. Available: http://labrosa.ee.columbia.edu/projects/coversongs/covers80/
Ellis D. Dynamic Time Warp (DTW) in Matlab. [Online]. Available: http://labrosa.ee.columbia.edu/matlab/dtw/
Ellis DPW, and Cotton C (2006) The 2007 LABROSA cover song detection system. Music Information Retrieval Evaluation eXchange (MIREX) extended abstract
Ellis DPW, Poliner GE (2007) Identifying cover songs with chroma features and dynamic programming beat tracking. IEEE Int. Conf. Acoustic, Speech and Signal Processing (ICASSP), Honolulu, HI, 1429 –1432
Fujishima T (1999) Realtime chord recognition of musical sound: a system using common lisp music. Int. Comput. Music Conf., Beijing 464–467
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MATH MathSciNet Google Scholar
Hinton GE, Salakhutdinov RS (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MATH MathSciNet Google Scholar
Humphrey EJ, Nieto O, Bello JP (2013) Data driven and discriminative projections for large-scale cover song identification. The 14th ISMIR Conference: 149–154
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386
Article Google Scholar
Lee K (2006) Identifying Cover Songs from Audio Using Harmonic Representation. Music Information Retrieval Evaluation eXchange (MIREX) extended abstract
Nieto O, Bello JP (2014) Music segment similarity using 2D-Fourier magnitude coefficients. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP): 664–668
Palm RB (2012) Deep learning toolbox, [Online]. Available: http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
Ranzato M, Boureau Y, LeCun Y (2007) Sparse feature learning for deep belief networks. Advances in Neural Information Processing Systems 20 (NIPS)
Ranzato M, Poultney C, Chopra S, LeCun Y (2006) Efficient learning of sparse representations with an energy-based model NIPS
Ravuri S, Ellis DPW (2010) Cover song detection: From high scores to general classification. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Dallas, Texas, U.S.A. 65–68
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011a) Contractive auto-encoders: Explicit invariance during feature extraction ICML
Riley M, Heinen E, Ghosh J (2008) A text retrieval approach to content-based audio retrieval. Int. Conf. on Music Information Retrieval, Philadelphia, Pennsylvaia, U.S.A. 295–300
Sailer C, Dressler K (2006) Finding cover songs by melodic similarity. Music Information Retrieval Evaluation eXchange (MIREX) extended abstract
Salakhutdinov R (2009) Learning deep generative models doctoral dissertation. University of Toronto, Toronto
Google Scholar
Salakhutdinov R Nonlinear dimensionality reduction using neural networks. Available: http://www.cs.toronto.edu/~rsalakhu/talks/NLDR_NIPS06workshop.pdf
Serrà J, Gómez E (2008) Audio cover song identification based on tonal sequence alignment. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, Nevada, U.S.A. 61–64
Serrà J, Gómez E, Herrera P, Serra X (2008) Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans Audio Speech Lang Process 16(6):1138–1151
Article Google Scholar
Serrà J, Gómez E, Herrera P (2010) Audio cover song identification and similarity: background, approaches, evaluation, and beyond. Adv Music Inf Retr 274(14):307–332
Article Google Scholar
Shepard RN (1982) Structural representations of musical pitch. In Deutsch, D, editor, The Psychology of Music, First Edition. Swets & Zeitlinger
Signal processing toolbox, time-dependent frequency analysis (specgram). [Online]. Available: http://faculty.petra.ac.id/resmana/private/matlab-help/toolbox/signal/specgram.html
Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, D. E. Rumelhart, J. L. McClelland, and C. PDP Research Group, Eds. Cambridge, MA, USA: MIT Press 194–281
Tralie CJ, Bendich P (2015) Cover song identification with timbral shape sequences. arXiv preprint arXiv:1507.05143
Vincent P, Larochelle H, Bengio Y, Manzagol, PA. (2008) Extracting and composing robust features with denoising autoencoders ICML
Voorhees EM (1999) Proceedings of the 8th Text Retrieval Conference. TREC-8 question answering track report. 77–82
Wang R, Han C, Wu Y, Guo T (2014) Fingerprint classification based on depth neural network. arXiv preprint arXiv:1409.5188
Witmer R, Marks A (2006) In: Macy L (ed) Cover, grove music online. Oxford Univ. Press, Oxford
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, Ming Chuan University, No.5, Deming Rd, Taoyuan, 33348, Taiwan
Jiunn-Tsair Fang
Department of Communication Engineering, National Central University, No.300, Jhongda Rd, Taoyuan, 32001, Taiwan
Chi-Ting Day & Pao-Chi Chang

Authors

Jiunn-Tsair Fang
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Ting Day
View author publications
You can also search for this author in PubMed Google Scholar
Pao-Chi Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pao-Chi Chang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, JT., Day, CT. & Chang, PC. Deep feature learning for cover song identification. Multimed Tools Appl 76, 23225–23238 (2017). https://doi.org/10.1007/s11042-016-4107-6

Download citation

Received: 02 October 2015
Revised: 27 October 2016
Accepted: 31 October 2016
Published: 13 November 2016
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11042-016-4107-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Deep feature learning for cover song identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning of chroma representation for cover song identification in compression domain

Deep Learning for Cover Song Apperception

Two-layer similarity fusion model for cover song identification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Deep feature learning for cover song identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning of chroma representation for cover song identification in compression domain

Deep Learning for Cover Song Apperception

Two-layer similarity fusion model for cover song identification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation