Multi-objective Approach to Speech Enhancement Using Tunable Q-Factor-based Wavelet Transform and ANN Techniques | Circuits, Systems, and Signal Processing
Skip to main content

Multi-objective Approach to Speech Enhancement Using Tunable Q-Factor-based Wavelet Transform and ANN Techniques

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

The tunable Q-factor-based wavelet transform (TQWT) is a novel method employed for the speech enhancement (SE) task. However, in TQWT, the controlling parameters Q-factor and the level of decomposition (J) are kept constant for different noise conditions which deteriorates the overall performance of SE. Generally, the performance of SE is calculated in terms of quality and intelligibility. However, it has been reported that these two evaluation parameters do not always correlate with each other because of the distortions introduced by the SE algorithms. These two important issues are addressed in this paper, and satisfactory solutions are provided by employing a multi-objective formulation to find the optimal values of the Q and J of the TQWT algorithm at different noise levels. In addition, to correctly estimate the appropriate values of Q and J from the unknown noisy speech, a low complexity functional link artificial neural network-based model is developed in this paper. To assess the performance of the proposed hybrid approach, subjective and objective evaluation tests are carried out using three standard noisy speech data sets. The results of the study are computed with six recently reported SE methods. It is demonstrated that in both the subjective and objective evaluation tests, the proposed hybrid approach outperforms the other six SE methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Data Availability

Data used in this proposed algorithm will be made available on request to the corresponding author.

References

  1. S. Ayat, M.T. Manzuri, R. Dianat, Wavelet based speech enhancement using a new thresholding algorithm, in Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 238–241 (2004)

  2. M. Bahoura, J. Rouat, Wavelet speech enhancement based on time-scale adaptation. Speech Commun. 48(12), 1620–1637 (2006)

    Article  Google Scholar 

  3. J. Benesty, Fundamentals of Speech Enhancement (Springer, Berlin, 2018)

    Book  Google Scholar 

  4. A. Bhowmick, M. Chandra, Speech enhancement using voiced speech probability based wavelet decomposition. Comput. Electr. Eng. 62, 706–718 (2017)

    Article  Google Scholar 

  5. A. Bhowmick, M. Chandra, A. Biswas, Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition. Int. J. Speech Technol. 20(4), 813–827 (2017)

    Article  Google Scholar 

  6. B. Carnero, A. Drygajlo, Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms. IEEE Trans. Signal Process. 47(6), 1622–1635 (1999)

    Article  Google Scholar 

  7. J. Chen, J. Benesty, Y. Huang, E.J Diethorn, Fundamentals of Noise Reduction. In: Benesty J., Sondhi M.M., Huang Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_43

  8. S.H. Chen, J.F. Wang, Speech enhancement using perceptual wavelet packet decomposition and Teager energy operator. J. VLSI Signal Process. Syst. Signal Image Video Technol. 36(2–3), 125–139 (2004)

    Article  Google Scholar 

  9. C.A.C. Coello, G.B. Lamont, D.A.V. Veldhuizen et al., Evolutionary Algorithms for Solving Multi-Objective Problems, vol. 5 (Springer, Berlin, 2007)

    MATH  Google Scholar 

  10. C.C. Coello, M.S. Lechuga, MOPSO: A proposal for multiple objective particle swarm optimization, in Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600), vol. 2, pp. 1051–1056 (2002)

  11. K. Daqrouq, I.N. Abu-Isbeih, O. Daoud, E. Khalaf, An investigation of speech enhancement using wavelet filtering method. Int. J. Speech Technol. 13(2), 101–115 (2010)

    Article  Google Scholar 

  12. T.K. Dash, S.S. Solanki, Investigation on the effect of the input features in the noise level classification of noisy speech. J. Sci. Ind. Res. 78(12), 868–872 (2019)

    Google Scholar 

  13. T.K. Dash, S.S. Solanki, G. Panda, Improved phase aware speech enhancement using bio-inspired and ANN techniques. Analog Integr. Circ. Sig. Process 102, 465–477 (2020)

    Article  Google Scholar 

  14. T.K. Dash, S.S. Solanki, G. Panda et al. Development of statistical estimators for speech enhancement using multi-objective grey wolf optimizer. Evol. Intel. 14, 767–778 (2021). https://doi.org/10.1007/s12065-020-00446-0

  15. C.S. Doire, M. Brookes, P.A. Naylor, C.M. Hicks, D. Betts, M.A. Dmour, S.H. Jensen, Single-channel online enhancement of speech corrupted by reverberation and noise. IEEE/ACM Trans. Audio Speech Lang. Process. 25(3), 572–587 (2017)

    Article  Google Scholar 

  16. D.L. Donoho, De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  17. D.L. Donoho, J.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3), 425–455 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  18. J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon technical report n 93 (1993)

  19. D. Giannoulis, M. Massberg, J.D. Reiss, Digital dynamic range compressor design—a tutorial and analysis. J. Audio Eng. Soc. 60(6), 399–408 (2012)

    Google Scholar 

  20. M.M. Goodwin, The STFT, Sinusoidal Models, and Speech Modification. In: Benesty J., Sondhi M.M., Huang Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_12

  21. V. Grancharov, W. Kleijn, Speech Quality Assessment. In: J. Benesty, M.M. Sondhi, Y.A. Huang (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. (2008). https://doi.org/10.1007/978-3-540-49127-9_5

  22. G. Grindlay, Blind Dereverberation of Audio Signals. E4810 Final Project, University of Columbia (2008)

  23. H.G. Hirsch, D. Pearce, The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, in ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW) (2000)

  24. Y. Hu, P.C. Loizou, Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. Speech Audio Process. 12(1), 59–67 (2004)

    Article  Google Scholar 

  25. Y. Hu, P.C. Loizou, Evaluation of objective measures for speech enhancement, in Ninth International Conference on Spoken Language Processing (2006)

  26. Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2007)

    Article  Google Scholar 

  27. Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)

    Article  Google Scholar 

  28. J. Indra, R.K. Shankar, N. Kasthuri, S.G. Manjuri, A modified tunable-Q wavelet transform approach for tamil speech enhancement. IETE J. Res. 5, 1–14 (2020)

    Google Scholar 

  29. M.S. Islam, T.H.A. Mahmud, W.U. Khan, Z. Ye, Supervised single channel speech enhancement based on stationary wavelet transforms and non-negative matrix factorization with concatenated framing process and subband smooth ratio mask. J. Signal Process. Syst. 92(4), 445–458 (2020)

    Article  Google Scholar 

  30. M.T. Islam, C. Shahnaz, W.P. Zhu, M.O. Ahmad, Speech enhancement based on student \(t\) modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Trans. Audio Speech Langu. Process. 23(11), 1800–1811 (2015)

  31. M. Jeub, M. Jeub (2020) Blind Reverberation Time Estimation (https://www.mathworks.com/matlabcentral/fileexchange/35740-blind-reverberation-time-estimation). MATLAB Central File Exchange. https://www.mathworks.com/matlabcentral/fileexchange/35740-blind-reverberation-time-estimation

  32. M.T. Johnson, X. Yuan, Y. Ren, Speech signal enhancement through adaptive wavelet thresholding. Speech Commun. 49(2), 123–133 (2007)

    Article  Google Scholar 

  33. J. Kennedy, Particle swarm optimization, Encyclopedia Mach. Learning (2010) 4, 760–766

  34. K. Khaldi, A.O. Boudraa, A. Bouchikhi, M.T.H. Alouane, Speech enhancement via EMD. EURASIP J. Adv. Signal Process. 2008(1), 873204 (2008)

    Article  MATH  Google Scholar 

  35. B.K. Khonglah, A. Dey, S.M. Prasanna, Speech enhancement using source information for phoneme recognition of speech with background music. Circuits Syst. Signal Process. 38(2), 643–663 (2019)

    Article  Google Scholar 

  36. U. Kjems, J.B. Boldt, M.S. Pedersen, T. Lunner, D. Wang, Role of mask pattern in intelligibility of ideal binary-masked noisy speech. J. Acousti. Soc. Am. 126(3), 1415–1426 (2009)

    Article  Google Scholar 

  37. A. Lerch, An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics (Wiley-IEEE Press, New York, 2012)

    Book  Google Scholar 

  38. Z.X. Li, L.R. Dai, Y. Song, I. McLoughlin, A conditional generative model for speech enhancement. Circuits Syst. Signal Process. 37(11), 5005–5022 (2018)

    Article  Google Scholar 

  39. P. Loizou, NOIZEUS: a noisy speech corpus for evaluation of speech enhancement algorithms. Speech Commun. 49, 588–601 (2007)

    Article  Google Scholar 

  40. P.C. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Cambridge, 2007)

    Book  Google Scholar 

  41. P.C. Loizou, Speech Quality Assessment. In: W. Lin, D. Tao, J. Kacprzyk, Z. Li , E. Izquierdo, H. Wang (eds) Multimedia Analysis, Processing and Communications. Studies in Computational Intelligence, vol 346. Springer, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19551-8_23

  42. P.C. Loizou, G. Kim, Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions. IEEE Trans. Audio Speech Lang. Process. 19(1), 47–56 (2011)

    Article  Google Scholar 

  43. H. Löllmann, E. Yilmaz, M. Jeub, P. Vary, An improved algorithm for blind reverberation time estimation, in Proceedings of International Workshop on Acoustic Echo and Noise Control (IWAENC), pp. 1–4 (2010)

  44. C.T. Lu, H.C. Wang, Enhancement of single channel speech based on masking property and wavelet transform. Speech Commun. 41(2–3), 409–427 (2003)

    Article  Google Scholar 

  45. J. Ma, P.C. Loizou, SNR loss: a new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Commun. 53(3), 340–354 (2011)

    Article  Google Scholar 

  46. R. Majhi, G. Panda, G. Sahoo, Development and performance evaluation of FLANN based model for forecasting of stock markets. Expert Syst. Appl. 36(3), 6800–6808 (2009)

    Article  Google Scholar 

  47. P. Malathi, G.R. Suresh, M. Moorthi et al. Speech Enhancement via Smart Larynx of Variable Frequency for Laryngectomee Patient for Tamil Language Syllables Using RADWT Algorithm. Circuits Syst Signal Process 38, 4202–4228 (2019). https://doi.org/10.1007/s00034-019-01055-8

  48. H. Mofid, H. Jazayeri-Rad, M. Shahbazian, A. Fetanat, Enhancing the performance of a parallel nitrogen expansion liquefaction process (NELP) using the multi-objective particle swarm optimization (MOPSO) algorithm. Energy 172, 286–303 (2019)

    Article  Google Scholar 

  49. S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evolut. Comput. 16, 1–18 (2014)

    Article  Google Scholar 

  50. A. Nishad, R.B. Pachori, Instantaneous fundamental frequency estimation of speech signals using tunable-\(Q\) wavelet transform, in 2018 International Conference on Signal Processing and Communications (SPCOM), pp. 157–161 (2018)

  51. K.K. Paliwal, L. Alsteris, Usefulness of phase in speech processing, in Proceedings IPSJ Spoken Language Processing Workshop, Gifu, Japan, pp. 1–6 (2003)

  52. Y.H. Pao, Adaptive pattern recognition and neuralnetwork. Addison-Wesley Publishing Company Int.; 1989

  53. J.C. Patra, R.N. Pal, B.N. Chatterji, G. Panda, Identification of nonlinear dynamic systems using functional link artificial neural networks. IEEE Trans. Syst. Man Cybernet. Part B (cybernet) 29(2), 254–262 (1999)

    Article  Google Scholar 

  54. P.M. Pradhan, G. Panda, Connectivity constrained wireless sensor deployment using multiobjective evolutionary algorithms and fuzzy decision making. Ad Hoc Netw. 10(6), 1134–1145 (2012)

    Article  Google Scholar 

  55. L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals, vol. 100 (Prentice-Hall, Englewood Cliffs, 1978)

    Google Scholar 

  56. I.T. Recommendation, Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862 (2001)

  57. Y. Ren, M.T. Johnson, J. Tao, Perceptually motivated wavelet packet transform for bioacoustic signal enhancement. J. Acoust. Soc. Am. 124(1), 316–327 (2008)

    Article  Google Scholar 

  58. C.O. Sakar, G. Serbes, A. Gunduz, H.C. Tunc, H. Nizam, B.E. Sakar, M. Tutuncu, T. Aydin, M.E. Isenkul, H. Apaydin, A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl. Soft Comput. 74, 255–263 (2019)

    Article  Google Scholar 

  59. I.W. Selesnick, Resonance-based signal decomposition: a new sparsity-enabled signal analysis method. Sig. Process. 91(12), 2793–2809 (2011)

    Article  Google Scholar 

  60. I.W. Selesnick, Wavelet transform with tunable Q-factor. IEEE Trans. Signal Process. 59(8), 3560–3575 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  61. G.L. Sicuranza, A. Carini, A generalized FLANN filter for nonlinear active noise control. IEEE Trans. Audio Speech Lang. Process. 19(8), 2412–2417 (2011)

    Article  Google Scholar 

  62. R. Soleymani, I.W. Selesnick, D.M. Landsberger, SEDA: a tunable Q-factor wavelet-based noise reduction algorithm for multi-talker babble. Speech Commun. 96, 102–115 (2018)

    Article  Google Scholar 

  63. C. Stedman, A matlab implementation of an audio compressor (2012), 2123–8227, http://hdl.handle.net/2123/8227

  64. C.M. Stein, Estimation of the mean of a multivariate normal distribution. Ann. Stat. 5, 1135–1151 (1981)

    MathSciNet  MATH  Google Scholar 

  65. C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, A short-time objective intelligibility measure for time-frequency weighted noisy speech, in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4214–4217 (2010)

  66. D.S. Trigueros, L. Meng, M. Hartnett, Face recognition: From traditional to deep learning methods. arXiv preprint arXiv:1811.00116 (2018)

  67. T. Tuncer, S. Dogan, U.R. Acharya, Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl. Based Syst. 211, 106547 (2021)

    Article  Google Scholar 

  68. A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Article  Google Scholar 

  69. E. Vincent, MUSHRAM: a MATLAB interface for MUSHRA listening tests. http://www.elec.qmul.ac.uk/people/emmanuelv/mushram (2005)

  70. W.D. Voiers, Interdependencies among measures of speech intelligility and speech Quality, in ICASSP’80. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, pp. 703–705 (1980)

  71. E. Wan, A. Nelson, R. Peterson, Speech enhancement assessment resource (SPEAR) database. CSLU, Oregon Graduate Institute of Science and Technology, Beta version Release v1. 0 (2002)

  72. L. Wang, A. Cavallaro, “Deep Learning Assisted Time-Frequency Processing for Speech Enhancement on Drones”, in IEEE Transactions on Emerging Topics in Computational Intelligence, https://doi.org/10.1109/TETCI.2020.3014934.

  73. Y.H. Wang, C.H. Yeh, H.W.V. Young, K. Hu, M.T. Lo, On the computational complexity of the empirical mode decomposition algorithm. Physica A 400, 159–167 (2014)

    Article  Google Scholar 

  74. Y. Xi, L. Bing-wu, Y. Fang, Speech enhancement using bionic wavelet transform and adaptive threshold function, in 2010 Second International Conference on Computational Intelligence and Natural Computing, vol. 1, pp. 265–268 (2010)

  75. A. Zeinalzadeh, Y. Mohammadi, M.H. Moradi, Optimal multi objective placement and sizing of multiple DGs and shunt capacitor banks simultaneously considering load uncertainty via MOPSO approach. Int. J. Electr. Power Energy Syst. 67, 336–349 (2015)

    Article  Google Scholar 

  76. Y. Zhao, B. Xu, R. Giri, T. Zhang, Perceptually guided speech enhancement using deep neural networks, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5074–5078 (2018)

  77. N. Zheng, X.L. Zhang, Phase-aware speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 63–76 (2018)

    Article  Google Scholar 

  78. G. Zhong, L.N. Wang, X. Ling, J. Dong, An overview on data representation learning: from traditional feature learning to recent deep learning. J. Finance Data Sci. 2(4), 265–278 (2016)

    Article  Google Scholar 

  79. U. Zölzer, Digital Audio Signal Processing (Wiley, New York, 2008)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tusar Kanti Dash.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dash, T.K., Solanki, S.S. & Panda, G. Multi-objective Approach to Speech Enhancement Using Tunable Q-Factor-based Wavelet Transform and ANN Techniques. Circuits Syst Signal Process 40, 6067–6097 (2021). https://doi.org/10.1007/s00034-021-01753-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-021-01753-2

Keywords