Combined angular margin and cosine margin softmax loss for music classification based on spectrograms

Li, Jingxian; Han, Lixin; Wang, Yang; Yuan, Baohua; Yuan, Xiaofeng; Yang, Yi; Yan, Hong

doi:10.1007/s00521-022-06896-0

Combined angular margin and cosine margin softmax loss for music classification based on spectrograms

S. I. : Effective and Efficient Deep Learning
Published: 12 February 2022

Volume 34, pages 10337–10353, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jingxian Li ORCID: orcid.org/0000-0002-2450-6052^1,2,
Lixin Han²,
Yang Wang³,
Baohua Yuan⁴,
Xiaofeng Yuan⁵,
Yi Yang² &
…
Hong Yan⁴

607 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Spectrograms provide rich feature information of music data. Significant progress has been made in music classification using spectrograms and Convolutional Neural Networks (CNNs). However, the softmax loss commonly used in existing CNNs lacks sufficient power to discriminate deep features of music. To overcome this limitation, we propose a Combined Angular Margin and Cosine Margin Softmax Loss (AMCM-Softmax) approach in this paper to enhance intra-class compactness and inter-class discrepancy simultaneously. Specifically, normalization on the weight vectors and feature vectors is adopted to eliminate radial variations. Then, an angular margin parameter and a cosine margin parameter are introduced to maximize the decision margin by enforcing angular and cosine margin constraints. Consequently, the discrimination of features is enhanced by normalization and margin maximization. The decision boundary and the target logit curve of AMCM-Softmax can provide a clear geometric interpretation. Extensive experiments on music datasets show that AMCM-Softmax consistently outperforms the current state-of-the-art approaches in classifying genre and emotion. Our work also shows that a margin loss function can lead to better performance and be used in an advanced CNN model for music classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A hybrid neural network model based on optimized margin softmax loss function for music classification

Article 16 October 2023

Classification of Music by Genre Using Probabilistic Models and Deep Learning Models

Music Genre Classification with Convolutional Neural Networks and Comparison with F, Q, and Mel Spectrogram-Based Images

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://www.kaggle.com/makvel/mer500

References

Costa YM, Oliveira LS, Silla CN Jr (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput 52:28–38
Article Google Scholar
Simonetta F, Ntalampiras S, Avanzini F (2019) Multimodal music information processing and retrieval: survey and future challenges. In: 2019 international workshop on multilayer music representation and processing (MMRP), pp 10–18
Zhuang Y, Chen Y, Zheng J (2020) Music genre classification with transformer classifier. In: Proceedings of the 2020 4th international conference on digital signal processing, Chengdu, China, June 19–21, 2020, pp 155–159
Chaudhary D, Singh NP, Singh S (2020) Development of music emotion classification system using convolution neural network. Int J Speech Technol 1–10
Doerfler M, Grill T, Bammer R, Flexer A (2020) Basic filters for convolutional neural networks: training or design. Neural Comput Appl 32(4):941–954
Article Google Scholar
Choi K, Fazekas G, Sandler M (2016) Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298
Choi K, Fazekas G, Sandler M, Cho K (2017) Convolutional recurrent neural networks for music classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, USA, March 5–9, 2017, pp 2392–2396
Liu W, Wen Y, Yu Z, Yang M (2016) Large-margin softmax loss for convolutional neural networks. In: International conference on machine learning, New York City, NY, USA, June 19–24, 2016, pp 507–516
Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) SphereFace: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, July 21–26, 2017, pp 6738–6746
Wang F, Cheng J, Liu W, Liu H (2018) Additive margin softmax for face verification. IEEE Signal Process Lett 25(7):926–930
Article Google Scholar
Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, June 18–22, 2018, pp 5265–5274
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision, Amsterdam, The Netherlands, October 11–14, 2016, pp 499–515
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Long Beach, CA, USA, June 16–20, 2019, pp 4690–4699
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), New York, NY, USA, June 17–22, 2006, Vol. 2, pp 1735–1742
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, June 7–12, 2015, pp 815–823
Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, Chen B, Wu Y (2014) Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, June 23–28, 2014, pp 1386–1393
Liu H, Zhu X, Lei Z, Li SZ (2019) Adaptiveface: Adaptive margin and sampling for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Long Beach, CA, USA, June 16–20, 2019, pp 11947–11956
Ferraro A, Bogdanov D, Jay XS, Jeon H, Yoon J (2021) How Low Can You Go? Reducing Frequency and Time Resolution in Current CNN Architectures for Music Auto-tagging. In: 2020 28th European signal processing conference (EUSIPCO), Amsterdam, Netherlands, January 18–21, 2021, pp 131–135
Liang B, Gu M (2020) Music genre classification using transfer learning. In: 2020 IEEE conference on multimedia information processing and retrieval (MIPR), Shenzhen, China, August 6–8, 2020, pp 392–393
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In Proceedings of the British machine vision conference, Swansea, UK, September 7–10, 2015, pp 1–12
Sun Y, Chen Y, Wang X, Tang X (2014) Deep learning face representation by joint identification-verification. Adv Neural Inf Process Syst 27:1988–1996
Google Scholar
Taenzer M, Abeßer J, Mimilakis SI, Weiß C, Müller M, Lukashevich H, Fraunhofer IDMT (2019) Investigating CNN-based instrument family recognition for western classical music recordings. In: Proceedings of the 20th international society for music information retrieval conference, Delft, The Netherlands, November 4–8, 2019, pp 612–619
Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, June 23–28, 2014, pp 1701–1708
Bhattacharjee M, Prasanna SM, Guha P (2020) Speech/music classification using features from spectral peaks. IEEE/ACM Trans Audio Speech Language Process 28:1549–1559
Article Google Scholar
Kim J, Urbano J, Liem C, Hanjalic A (2020) One deep music representation to rule them all? a comparative analysis of different representation learning strategies. Neural Comput Appl 32(4):1067–1093
Article Google Scholar
Hansen C, Hansen C, Maystre L, Mehrotra R, Brost B, Tomasi F, Lalmas M (2020) Contextual and sequential user embeddings for large-scale music recommendation. In: Fourteenth ACM conference on recommender systems, virtual event, Brazil, September 22–26, 2020, pp 53–62
Rahardwika DS, Rachmawanto EH, Sari CA, Irawan C, Kusumaningrum DP, Trusthi SL (2020) Comparison of SVM, KNN, and NB Classifier for Genre Music Classification based on Metadata. In: 2020 international seminar on application for technology of information and communication (iSemantic), pp 12–16
Sharma G, Umapathy K, Krishnan S (2020) Trends in audio signal feature extraction methods. Appl Acoust 158:107020
Article Google Scholar
Zeng Y, Mao H, Peng D, Yi Z (2019) Spectrogram based multi-task audio classification. Multimed Tools Appl 78(3):3705–3722
Article Google Scholar
Choi K, Fazekas G, Sandler M (2016) Explaining deep convolutional neural networks on music classification. arXiv preprint arXiv:1607.02444
Kong Q, Feng X, Li Y (2014) Music genre classification using convolutional neural network. In: Proceedings of international society for music information retrieval conference, Taipei, Taiwan, China, October 27–31, 2014
Lidy T, Schindler A (2016) Parallel convolutional neural networks for music genre and mood classification. In: Proceedings of the 17th international society for music information retrieval conference, New York City, United States, August 7–11, 2016
Liu X, Chen Q, Wu X, Liu Y, Liu Y (2017) CNN based music emotion classification. arXiv preprint arXiv:1704.05665
Zhang W, Lei W, Xu X, Xing X (2016) Improved music genre classification with convolutional neural networks. In: 17th annual conference of the international speech communication association, San Francisco, CA, USA, September 8–12, 2016, pp 3304–3308
Pons J, Serra X (2019) Randomly weighted CNNs for (music) audio classification. In: IEEE international conference on acoustics, speech and signal processing, Brighton, United Kingdom, May 12–17, 2019, pp 336–340
Li C, Bao Z, Li L, Zhao Z (2020) Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition. Inf Process Manag 57(3):102185
Article Google Scholar
Russo M, Kraljević L, Stella M, Sikora M (2020) Cochleogram-based approach for detecting perceived emotions in music. Inf Process Manag 57(5):102270
Zhou ZH, Feng J (2019) Deep forest. National Sci Rev 6(1):74–86
Article Google Scholar
Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), San Diego, CA, USA, June 20–26, 2005, Vol. 1, pp 539–546
Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition, Copenhagen, Denmark, October 12–14, 2015, pp 84–92
Kemelmacher-Shlizerman I, Seitz SM, Miller D, Brossard E (2016) The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, June 27–30, 2016, pp 4873–4882
Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. In: the 24th IEEE conference on computer vision and pattern recognition, Colorado Springs, CO, USA, 20–25 June 2011, pp 529–534
Ranjan R, Castillo CD, Chellappa R (2017) L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507
Wang F, Xiang X, Cheng J, Yuille AL (2017) Normface: L2 hypersphere embedding for face verification. In: Proceedings of the 25th ACM international conference on Multimedia, Mountain View, CA, USA, October 23–27, 2017, pp 1041–1049
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302
Article Google Scholar
Defferrard M, Benzi K, Vandergheynst P, Bresson X (2017) FMA: a dataset for music analysis. In: 18th international society for music information retrieval conference, Suzhou, China, October 23–27, 2017, pp 316–323
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, June 27–30, 2016, pp 770–778
Gulli A, Pal S (2017) Deep learning with Keras. Packt Publishing Ltd
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

Download references

Acknowledgements

This work was supported by the Jiangsu Provincial Key Constructive Laboratory for Big Data of Psychology and Cognitive Science under Grant No.72592062003G, the Natural Science Foundation of the Colleges and Universities in Anhui Province of China under Grant No. KJ2020A0035 and No. KJ2021A0640, and the Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA).

Funding

Jiangsu Provincial Key Constructive Laboratory for Big Data of Psychology and Cognitive Science, No.72592062003G, Xiaofeng Yuan, Natural Science Foundation of the Colleges and Universities in Anhui Province of China, No. KJ2020A0035, Yi Yang, Natural Science Foundation of the Colleges and Universities in Anhui Province of China, No. KJ2021A0640, Yang Wang, Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA), Hong Yan.

Author information

Authors and Affiliations

School of Software Engineering, Jinling Institute of Technology, Nanjing, China
Jingxian Li
School of Computer and Information, Hohai University, Nanjing, China
Jingxian Li, Lixin Han & Yi Yang
School of Computer and Information, Anqing Normal University, Anqing, China
Yang Wang
Department of Electrical Engineering, City University of Hong Kong, Hongkong, China
Baohua Yuan & Hong Yan
School of Information Engineering, Yancheng Teachers University, Yancheng, China
Xiaofeng Yuan

Authors

Jingxian Li
View author publications
You can also search for this author in PubMed Google Scholar
Lixin Han
View author publications
You can also search for this author in PubMed Google Scholar
Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Baohua Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jingxian Li or Lixin Han.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Han, L., Wang, Y. et al. Combined angular margin and cosine margin softmax loss for music classification based on spectrograms. Neural Comput & Applic 34, 10337–10353 (2022). https://doi.org/10.1007/s00521-022-06896-0

Download citation

Received: 14 April 2021
Accepted: 04 January 2022
Published: 12 February 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00521-022-06896-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Combined angular margin and cosine margin softmax loss for music classification based on spectrograms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A hybrid neural network model based on optimized margin softmax loss function for music classification

Classification of Music by Genre Using Probabilistic Models and Deep Learning Models

Music Genre Classification with Convolutional Neural Networks and Comparison with F, Q, and Mel Spectrogram-Based Images

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Combined angular margin and cosine margin softmax loss for music classification based on spectrograms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A hybrid neural network model based on optimized margin softmax loss function for music classification

Classification of Music by Genre Using Probabilistic Models and Deep Learning Models

Music Genre Classification with Convolutional Neural Networks and Comparison with F, Q, and Mel Spectrogram-Based Images

Explore related subjects

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation