Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform

Dash, Sukanta Kumar; Solanki, S. S.; Chakraborty, Soubhik

doi:10.1007/s00034-024-02641-1

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform

Published: 19 March 2024

Volume 43, pages 4239–4271, (2024)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

263 Accesses
Explore all metrics

Abstract

In this article, a new multi-input deep convolutional neural networks (deep-CNNs) model architecture is addressed for the recognition of predominant instruments in polyphonic music using discrete wavelet transform (DWT). The proposed deep-CNNs model employs a fusion of Mel-spectrogram and Mel-frequency cepstral coefficient (MFCC) features as its first input and a concatenation of statistical features extracted from decomposed signals obtained through DWT as its second input. Particle swarm optimization (PSO), a feature selection algorithm, is employed to minimize the feature dimensionality by excluding the irrelevant features. The proposed model is experimentally tested on the IRMAS dataset using fixed-length single-labeled train data for model training and variable-length multi-labeled test data for model evaluation. The proposed model is evaluated using several DWT feature dimensions, and a feature dimension of 250 yields the best outcomes. The model performance is assessed by averaging the precision, recall, and F1 measures on a micro- and macro-level. For a set of optimal model hyperparameter values, our proposed model can reach micro and macro F1 measures of 0.695 and 0.631, which are 12.28% and 23.0% greater as compared to the benchmark Han et al. (IEEE/ACM Trans Audio Speech Lang Process 25(1):208–221, 2016. https://doi.org/10.1109/taslp.2016.2632307) CNN model, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Predominant Musical Instrument Identification Using Deep Hybrid Neural Networks

Deep Learning Approach to Joint Identification of Instrument Pitch and Raga for Indian Classical Music

Exploiting cepstral coefficients and CNN for efficient musical instrument classification

Article 14 September 2023

References

A. al-Qerem, F. Kharbat, S. Nashwan, S. Ashraf, K. Blaou, General model for best feature extraction of EEG using discrete wavelet transform wavelet family and differential evolution. Int. J. Distrib. Sens. Netw. 16, 1–21 (2020). https://doi.org/10.1177/1550147720911009
Article Google Scholar
K. Alsharabi, Y.B. Salamah, A.M. Abdurraqeeb, M. Aljalal, F.A. Alturki, EEG signal processing for Alzheimer’s disorders using discrete wavelet transform and machine learning approaches. IEEE Access 10, 89781–89797 (2022). https://doi.org/10.1109/access.2022.3198988
Article Google Scholar
J.J. Aucouturier, Sounds like teen spirit: Computational insights into the grounding of everyday musical terms, in Language, Evolution and the Brain, Book Chapter-2 (City University of Hong Kong Press, 2009), pp. 35–64
E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff, A. Klapuri, Automatic music transcription: challenges and future directions. J. Intell. Inf. Syst. 41(3), 407–434 (2013). https://doi.org/10.1007/s10844-013-0258-3
Article Google Scholar
J.J. Bosch, J. Janer, F. Fuhrmann, P. Herrera, A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals, in Proceedings, International Society for Music Information Retrieval Conference (ISMIR 2012) (2012), pp. 559–564. https://doi.org/10.5281/zenodo.1416075
L. Debnath, J.-P. Antoine, Wavelet transforms and their applications. Phys. Today 56(4), 68–68 (2003). https://doi.org/10.1063/1.1580056
Article Google Scholar
J.D. Deng, C. Simmermacher, S. Cranefield, A study on feature analysis for musical instrument classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(2), 429–438 (2008). https://doi.org/10.1109/tsmcb.2007.913394
Article Google Scholar
Z. Duan, B. Pardo, L. Daudet, A novel Cepstral representation for timbre modeling of sound sources in polyphonic mixtures, in Proceedings, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) (2014), pp. 7495–7499. https://doi.org/10.1109/icassp.2014.6855057
R.C. Eberhart, Y. Shi, Particle swarm optimization: development, applications and resources, in Proceedings, IEEE Conference on Evolutionary Computation, (IEEE Cat. No.01TH8546), ICEC, vol. 1 (2001), pp. 81–86. https://doi.org/10.1109/cec.2001.934374
M.R. Every, Discriminating between pitched sources in music audio. IEEE Trans. Audio Speech Lang. Process. 16(2), 267–277 (2008). https://doi.org/10.1109/tasl.2007.908128
Article Google Scholar
F. Fuhrmann, P. Herrera, Polyphonic instrument recognition for exploring semantic similarities in music, in Proceedings, 13th International Conference on Digital Audio Effects (DAFx-10) (2010), pp. 1–8. http://mtg.upf.edu/files/publications/ffuhrmann_ dafx10_ final_0.pdf
D. Ghosal, M.H. Kolekar, Music genre recognition using deep neural networks and transfer learning, in Proceedings, Interspeech (2018), pp. 2087–2091. https://doi.org/10.21437/interspeech.2018-2045
D. Giannoulis, A. Klapuri, Musical instrument recognition in polyphonic audio using missing feature approach. IEEE Trans. Audio Speech Lang. Process. 21(9), 1805–1817 (2013). https://doi.org/10.1109/tasl.2013.2248720
Article Google Scholar
X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proceedings, 13th International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 9, Chia Laguna Resort, Sardinia, Italy (2010), pp. 249–256. https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
M. Goto, H. Hashiguchi, T. Nishimura, R. Oka, RWC music database: popular, classical, and jazz music database, in Proceedings, 3rd International Conference on Music Information Retrieval (ISMIR) (2002), pp. 287–288. https://www.researchgate.net/publication/220723431
S. Gururani, C. Summers, A. Lerch, Instrument activity detection in polyphonic music using deep neural networks, in Proceedings, International Society for Music Information Retrieval Conference, Paris, France (2018), pp. 569–576. https://www.researchgate.net/publication/ 332621784
Y. Han, J. Kim, K. Lee, Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio. Speech Lang. Process. 25(1), 208–221 (2016). https://doi.org/10.1109/taslp.2016.2632307
Article Google Scholar
K.K. Hasan, U.K. Ngah, M.F.M. Salleh, Multilevel decomposition discrete wavelet transform for hardware image compression architectures applications, in Proceedings, IEEE International Conference on Control System, Computing and Engineering, Penang, Malaysia (2013), pp. 315–320. https://doi.org/10.1109/iccsce.2013.6719981
T. Heittola, A. Klapuri, T. Virtanen, Musical instrument recognition in polyphonic audio using source-filter model for sound separation, in Proceedings, International Society for Music Information Retrieval Conference (ISMIR) (2009), pp. 327–332. https://www.researchgate.net/publication/220723588
J. Huang, Y. Dong, J. Liu, C. Dong, H. Wang, Sports audio segmentation and classification, in Proceedings, International Conference on Network Infrastructure and Digital Content (IC-NIDC ?09) (IEEE, Beijing, China, 2009), pp. 379–383. https://doi.org/10.1109/icnidc.2009.5360872
R.T. Irene, C. Borrelli, M. Zanoni, M. Buccoli, A. Sarti, Automatic playlist generation using convolutional neural networks and recurrent neural networks, in Proceedings, European Signal Processing Conference (EUSIPCO) (IEEE, 2019), pp. 1–5. https://doi.org/10.23919/eusipco.2019.8903002
T. Kitahara, M. Goto, K. Komatani, T. Ogata, H.G. Okuno, Instrument identification in polyphonic music: feature weighting to minimize influence of sound overlaps. J. Appl. Signal Process. (EURASIP) 2007, 155–155 (2007). https://doi.org/10.1155/2007/51979
Article Google Scholar
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539
Article Google Scholar
C.R. Lekshmi, R. Rajeev, Multiple predominant instruments recognition in polyphonic music using spectro/modgd-gram fusion. Circuits Syst. Signal Process. 42(6), 3464–3484 (2023). https://doi.org/10.1007/s00034-022-02278-y
Article Google Scholar
P. Li, J. Qian, T. Wang, Automatic instrument recognition in polyphonic music using convolutional neural networks (2015), pp. 1–5. https://doi.org/10.48550/arXiv.1511.05520. arXiv:1511.05520
P. Li, Z. Chen, L.T. Yang, Q. Zhang, M.J. Deen, Deep convolutional computation model for feature learning on big data in Internet of Things. IEEE Trans. Ind. Inf. 14(2), 790–798 (2018). https://doi.org/10.1109/tii.2017.2739340
Article Google Scholar
Y. Luo, N. Mesgarani, Conv-tasnet: surpassing ideal time-frequency magnitude masking for speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 27(8), 1256–1266 (2019). https://doi.org/10.1109/taslp.2019.2915167
Article Google Scholar
E. Magosso, M. Ursino, A. Zaniboni, E. Gardella, A wavelet-based energetic approach for the analysis of biomedical signals: application to the electroencephalogram and electro-oculogram. Appl. Math. Comput. 207(1), 42–62 (2009). https://doi.org/10.1016/j.amc.2007.10.069
Article MathSciNet Google Scholar
B. McFee, C. Raffel, D. Liang, D.P.W. Ellis, M. McVicar, E. Battenberg, O. Nieto, Librosa: audio and music signal analysis in Python, in Proceedings, 14th Python in Science Conference (SCIPY 2015), vol. 8 (2015), pp. 18–25. https://doi.org/10.25080/majora-7b98e3ed-003
V. Nair, G.E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings, 27th International Conference on Machine Learning, Haifa, Israel (2010), pp. 807–814. https://www.cs.toronto.edu/~fritz/absps/reluICML.pdf
T.-L. Nguyen, S. Kavuri, M. Lee, A multimodal convolutional neuro-fuzzy network for emotional understanding of movie clips. Neural Netw. 118, 208–219 (2019). https://doi.org/10.1016/j.neunet.2019.06.010
Article Google Scholar
[Online]. Available: http://theremin.music.uiowa.edu/MIS.html
F.J. Opolko, J. Wapnick, Mcgill University master samples. Montreal, QC, Canada: McGill University, Faculty of Music (1987). https://www.worldcat.org/title/mums-mcgill-university-master-samples/oclc/17946083
J. Pons, O. Slizovskaia, R. Gong, E. Gomez, X. Serra, Timbre analysis of music audio signals with convolutional neural networks, in Proceedings, 25th European Signal Processing Conference (IEEE, 2017), pp. 2744–2748. https://doi.org/10.23919/eusipco.2017.8081710
L. Prechelt, Early stopping—but when?, in Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol. 7700, ed. by G.B. Orr, K.R. Muller (Springer, Berlin, 2012), pp.53–67. https://doi.org/10.1007/978-3-642-35289-8_5
Chapter Google Scholar
H. Purwins, B. Li, T. Virtanen, J. Schluter, S.-Y. Chang, T. Sainath, Deep learning for audio signal processing. IEEE J. Sel. Top. Signal process 13(2), 206–219 (2019). https://doi.org/10.1109/jstsp.2019.2908700
Article Google Scholar
L. Qiu, S. Li, Y. Sung, DBTMPE: deep bidirectional transformers-based masked predictive encoder approach for music genre classification. Mathematics 9(5), 1–17 (2021). https://doi.org/10.3390/math9050530
Article Google Scholar
L.R. Rabiner, R.W. Schafer, Theory and Applications of Digital Speech Processing (Prentice Hall Press, Hoboken, 2010)
Google Scholar
L.C. Reghunath, R. Rajan, Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music. EURASIP J. Audio Speech Music Process. 2022(1), 1–14 (2022). https://doi.org/10.1186/s13636-022-00245-8
Article Google Scholar
A. Sano, W. Chen, D. Lopez-Martinez, S. Taylor, R.W. Picard, Multimodal ambulatory sleep detection using LSTM recurrent neural networks. IEEE J. Biomed. Health Inform. 23(4), 1607–1617 (2019). https://doi.org/10.1109/jbhi.2018.2867619
Article Google Scholar
K. Schulze-Forster, K.G. Richard, L. Kelley, C.S.J. Doire, R. Badeau, Unsupervised music source separation using differentiable parametric source models. IEEE/ACM Trans. Audio Speech Lang. Process. 31, 1276–1289 (2023). https://doi.org/10.1109/taslp.2023.3252272
Article Google Scholar
M. Sharma, R.B. Pachori, U.R. Acharya, A new approach to characterize epileptic seizures using analytic time-frequency flexible wavelet transform and fractal dimension. Pattern Recogn. Lett. 94, 172–179 (2017). https://doi.org/10.1016/j.patrec.2017.03.023
Article Google Scholar
L. Shi, Y. Zhang, J. Zhang, Lung sound recognition method based on wavelet feature enhancement and time-frequency synchronous modeling. IEEE J. Biomed. Health Inform. 27(1), 308–318 (2023). https://doi.org/10.1109/jbhi.2022.3210996
Article Google Scholar
D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, M.D. Plumbley, Detection and classification of acoustic scenes and events. IEEE Trans. Multimed. 17(10), 1733–1746 (2015). https://doi.org/10.1109/tmm.2015.2428998
Article Google Scholar
M. Sukhavasi, S. Adapa, Music theme recognition using CNN and self-attention (2019). https://doi.org/10.48550/arXiv.1911.07041, arXiv preprint arXiv:1911.07041
T. Tuncer, S. Dogan, A. Subasi, Surface EMG signal classification using ternary pattern and discrete wavelet transform based feature extraction for hand movement recognition. Biomed. Signal Process. Control 58, 1–12 (2020). https://doi.org/10.1016/j.bspc.2020.101872
Article Google Scholar
T. Tuncer, S. Dogan, A. Subasi, EEG-based driving fatigue detection using multilevel feature extraction and iterative hybrid feature selection. Biomed. Signal Process. Control 68, 1–11 (2021). https://doi.org/10.1016/j.bspc.2021.102591
Article Google Scholar
S.P. Vaidya, Fingerprint-based robust medical image watermarking in hybrid transform. Vis. Comput. 39, 2245–2260 (2022). https://doi.org/10.1007/s00371-022-02406-4
Article Google Scholar
C.-Y. Wang, J.C. Wang, A. Santoso, C.C. Chiang, C.H. Wu, Sound event recognition using auditory-receptive-field binary pattern and hierarchical-diving deep belief network. IEEE/ACM Trans. Audio Speech Lang. Process. 26(8), 1336–1351 (2018). https://doi.org/10.1109/taslp.2017.2738443
Article Google Scholar
Wikipedia contributors. Mel-frequency cepstrum—Wikipedia, the free encyclopedia (2019). https://en.wikipedia.org/w/index.php?title=Mel-frequency_cepstrum &oldid=917928298
J. Wu, E. Vincent, S.A. Raczynski, T. Nishimoto, N. Ono, S. Sagayama, Polyphonic pitch estimation and instrument identification by joint modeling of sustained and attack sounds. IEEE J. Sel. Top. Signal Process. 5(6), 1124–1132 (2011). https://doi.org/10.1109/jstsp.2011.2158064
Article Google Scholar
X. Wu, C.-W. Ngo, Q. Li, Threading and auto documenting news videos: a promising solution to rapidly browse news topics. IEEE Signal Process. Mag. 23(2), 59–68 (2006). https://doi.org/10.1109/msp.2006.1621449
Article Google Scholar
D. Yu, H. Duan, J. Fang, B. Zeng, Predominant instrument recognition based on deep neural network with auxiliary classification. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 852–861 (2020). https://doi.org/10.1109/taslp.2020.2971419
Article Google Scholar
N. Zermi, A. Khaldi, M.R. Kafi, F. Kahlessenane, S. Euschi, Robust SVD-based schemes for medical image watermarking. Microprocess. Microsyst. 84, 1–12 (2021). https://doi.org/10.1016/j.micpro.2021.104134
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, 835215, India
Sukanta Kumar Dash & S. S. Solanki
Department of Mathematics, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, 835215, India
Soubhik Chakraborty

Authors

Sukanta Kumar Dash
View author publications
You can also search for this author in PubMed Google Scholar
S. S. Solanki
View author publications
You can also search for this author in PubMed Google Scholar
Soubhik Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sukanta Kumar Dash.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dash, S.K., Solanki, S.S. & Chakraborty, S. Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform. Circuits Syst Signal Process 43, 4239–4271 (2024). https://doi.org/10.1007/s00034-024-02641-1

Download citation

Received: 04 June 2023
Accepted: 19 January 2024
Published: 19 March 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s00034-024-02641-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Predominant Musical Instrument Identification Using Deep Hybrid Neural Networks

Deep Learning Approach to Joint Identification of Instrument Pitch and Raga for Indian Classical Music

Exploiting cepstral coefficients and CNN for efficient musical instrument classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Predominant Musical Instrument Identification Using Deep Hybrid Neural Networks

Deep Learning Approach to Joint Identification of Instrument Pitch and Raga for Indian Classical Music

Exploiting cepstral coefficients and CNN for efficient musical instrument classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation