Abstract
In this paper, we propose an architecture based on a stacked auto-encoder (SAE) for the classification of music genre. Each level in the stacked architecture works by stacking some hidden representations resulting from the previous level and related to different frames of the input signal. In this way, the proposed architecture shows a more robust classification compared to a standard SAE. The input to the first level of the SAE is fed by a set of 57 peculiar features extracted from the music signals. Some experimental results show the effectiveness of the proposed approach with respect to other state-of-the-art methods. In particular, the proposed architecture is compared to the support vector machine (SVM), multi-layer perceptron (MLP) and logistic regression (LR).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The library can be downloaded from: https://librosa.github.io/librosa/feature.html.
References
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Bourlard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59, 291–294 (1988)
Castán, D., Ortega, A.A.M., Lleida, E.: Audio segmentation-by-classification approach based on factor analysis in broadcast news domain. EURASIP J. Audio, Speech, Music. Process. 2014(34), 1–13 (2014)
Choi, K., Fazekas, G., Cho, K., Sandler, M.: A tutorial on deep learning for music information retrieval arXiv:1709.04396 (2018)
Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimed. 13(2), 303–319 (2011)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)
Goulart, A.J.H., Guido, R.C., Maciel, C.D.: Exploring different approaches for music genre classification. Egypt. Inform. J. 13(2), 59–63 (2012)
Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length, and Helmholtz free energy. In: Proceeding of NIPS 1993 (1994)
Mandel, M., Ellis, D.: Song-level features and support vector machines for music classification. In: Proceeding of 6th International Symposium on Music Information Retrieval. London, UK (2005)
Mierswa, I., Morik, K.: Automatic feature extraction for classifying audio data. Mach. Learn. 58(2–3), 127–149 (2005)
Pampalk, E., Flexer, A., Widmer, G.: Improvements of audio based music similarity and genre classification? In: Proceeding of 6th International Symposium on Music Information Retrieval. London, UK (2005)
Patsis, Y., Verhelst, W.: A speech/music/silence/garbage/ classifier for searching and indexing broadcast news material. In: Proceeding of 19th International Workshop on Database and Expert Systems Application (DEXA ’08). Turin, Italy (2008)
Poria, S., Gelbukh, A., Hussain, A., Bandyopadhyay, S., Howard, N.: Music genre classification: A semi-supervised approach. In: Proceeding of the Mexican Conference on Pattern Recognition (MCPR 2013), pp. 254–263 (2013)
Scardapane, S., Comminiello, D., Scarpiniti, M., Uncini, A.: Music classification using extreme learning machines. In: 8th International Symposium on Image and Signal Processing and Analysis (ISPA2013), pp. 377–381. Trieste, Italy (2013)
Scaringella, N., Zoia, G., Mlynek, D.: Automatic genre classification of music content: a survey. IEEE Signal Process. Mag. 23(2), 133–141 (2006)
Shao, X., Xu, C., Kankanhalli, M.: Unsupervised classification of musical genre using hidden Markov model. In: IEEE International Conference of Multimedia Explore (ICME 2004). Taiwan (2004)
Silla, C.N., Kaestner, C.A., Koerich, A.L.: Automatic music genre classification using ensemble of classifiers. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 1687–1692 (2007)
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Vavrek, J., Vozáriková, E., Pleva, M., Juhár, J.: Broadcast news audio classification using SVM binary trees. In: Proceeding of the 35th International Conference on Telecommunications and Signal Processing (TSP 2012) (2012)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08), pp. 1096–1103 (2008)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Scarpiniti, M., Scardapane, S., Comminiello, D., Uncini, A. (2020). Music Genre Classification Using Stacked Auto-Encoders. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Neural Approaches to Dynamics of Signal Exchanges. Smart Innovation, Systems and Technologies, vol 151. Springer, Singapore. https://doi.org/10.1007/978-981-13-8950-4_2
Download citation
DOI: https://doi.org/10.1007/978-981-13-8950-4_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8949-8
Online ISBN: 978-981-13-8950-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)