Abstract
Speech enhancement techniques play a vital role in enhancing the clarity and overall quality of audio signals, addressing issues like background noise, reverberation, and channel impairments that often degrade speech intelligibility. Neural network models, including DNNs, CNNs, RNNs, and VAEs, have demonstrated their effectiveness in improving speech quality by decoding noisy speech inputs, capturing intricate patterns, and extracting relevant information. Evaluation metrics like PESQ and STOI are commonly employed to assess the performance of speech enhancement algorithms. STOI measures the understandability of enhanced speech using short-time spectral information, while PESQ evaluates the subjective quality of enhanced speech compared to the original clean speech. Moreover, recent advancements in speech enhancement research have shown that employing LinkNet, a specific neural network architecture, can significantly surpass the efficiency of other models. LinkNet has demonstrated superior performance in enhancing speech signals by effectively mitigating noise, reducing artifacts, and enhancing the overall intelligibility of the output. Its architecture incorporates innovative techniques that facilitate the extraction of meaningful features from noisy speech inputs, leading to remarkable results in terms of speech quality improvement. By leveraging LinkNet, researchers and practitioners can further advance the field of speech enhancement and achieve outstanding outcomes in terms of audio clarity and intelligibility.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of speech corrupted by acoustic noise. In: ICASSP 1979. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 208–211. IEEE (1979)
Chaurasia, A., Culurciello, E.: Linknet: Exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2017)
Dendrinos, M., Bakamidis, S., Carayannis, G.: Speech enhancement from noise: a regenerative approach. Speech Commun. 10(1), 45–57 (1991)
Dong, L.F., Gan, Y.Z., Mao, X.L., Yang, Y.B., Shen, C.: Learning deep representations using convolutional auto-encoders with symmetric skip connections. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3006–3010. IEEE (2018)
Ephraim, Y.: Statistical-model-based speech enhancement systems. Proc. IEEE 80(10), 1526–1555 (1992)
Ephraim, Y., Van Trees, H.L.: A signal subspace approach for speech enhancement. IEEE Trans. Speech Audio Process. 3(4), 251–266 (1995)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Lim, J., Oppenheim, A.: All-pole modeling of degraded speech. IEEE Trans. Acoust. Speech Signal Process. 26(3), 197–210 (1978)
Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: Interspeech, vol. 2013, pp. 436–440 (2013)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), pp. 807–814 (2010)
Nawab, S., Quatieri, T., Lim, J.: Signal reconstruction from short-time Fourier transform magnitude. IEEE Trans. Acoust. Speech Signal Process. 31(4), 986–998 (1983)
Park, S.R., Lee, J.: A fully convolutional neural network for speech enhancement. arXiv preprint arXiv:1609.07132 (2016)
Parveen, S., Green, P.: Speech enhancement with missing data techniques using recurrent neural networks. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing. vol. 1, pp. 1–733. IEEE (2004)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Tamura, S., Waibel, A.: Noise reduction using connectionist models. In: ICASSP-1988, International Conference on Acoustics, Speech, and Signal Processing, pp. 553–556. IEEE (1988)
Union, I.: Wideband extension to recommendation p. 862 for the assessment of wideband telephone networks and speech codecs. International Telecommunication Union, Recommendation P. 862 (2007)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A., Bottou, L.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12), 1–38 (2010)
Yuliani, A.R., Amri, M.F., Suryawati, E., Ramdan, A., Pardede, H.F.: Speech enhancement using deep learning methods: a review. Jurnal Elektronika dan Telekomunikasi 21(1), 19–26 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Patel, A., Prasad, G.S., Chandra, S., Bharati, P., Das Mandal, S.K. (2023). Speech Enhancement Using LinkNet Architecture. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-48309-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)