Speech coding techniques and challenges: a comprehensive literature survey

G, Nagaraja B; Anees, Mohamed; G, Thimmaraja Yadava

doi:10.1007/s11042-023-16665-3

Speech coding techniques and challenges: a comprehensive literature survey

Published: 14 September 2023

Volume 83, pages 29859–29879, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

634 Accesses
3 Citations
Explore all metrics

Abstract

Speech coding is the process of compressing speech signals for transmission and storage in communication systems. In recent years, speech coding has become increasingly important due to the growing demand for low bitrate communication systems. This paper presents a comprehensive literature survey of speech coding techniques, their importance, and the challenges associated with their implementation. We also discuss the use of speech enhancement techniques in speech coding. The survey covers various speech coding techniques and their limitations in adverse conditions. We highlight the potential of machine learning-based methods in improving speech quality and intelligibility in speech coding systems. Further, metrics for evaluating the performance of speech coding algorithms are highlighted. The survey also discusses the key issues and challenges associated with speech coding, including the trade-off between speech quality and bit rate, and the impact of background noise on speech quality. Further it also covers popular speech databases used in coding research. Our findings provide valuable insights for researchers and practitioners working in speech coding and demonstrate the importance of speech enhancement techniques for improving speech quality and intelligibility in low bitrate communication systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Machine Learning for Speech Recognition

Challenges in Speech Coding Research

A Comprehensive Analysis of Speech Recognition Systems in Healthcare: Current Research Challenges and Future Prospects

Article 03 January 2024

References

Nagaraja BG, Jayanna HS (2012) Mono and cross lingual speaker identification with the constraint of limited data. IEEE International Conference on Pattern Recognition, Informatics and Medical Engineering 439–443
Spanias AS (1994) Speech coding: A tutorial review. Proc IEEE 82(10):1541–1582
Google Scholar
Flanagan JL, Atal BS, Crochiere RE, Jayant NS, Schroeder MR, Tribolet JM (1979) Speech coding. IEEE Trans Commun 27:710–737
Google Scholar
Makhoul J, Roucos S, Gish H (1985) Vector quantization in speech coding. Proc IEEE 73(11):1551–1588
Google Scholar
Gibson JD (2005) Speech coding methods, standards, and applications. IEEE Circuits and Systems Magazine 5(4):30–49
Google Scholar
Atal BS, Cuperman V, Gersho A (1991) Advances in speech coding. Springer Science & Business Media 114
Goldberg R (2019) A practical handbook of speech coders. CRC Press
Google Scholar
Jainar SJ, Sale PL, Nagaraja BG (2020) VAD, feature extraction and modelling techniques for speaker recognition: a review. International Journal of Signal and Imaging Systems Engineering 12(1–2):1–18
Google Scholar
Nagaraja BG, Jayanna HS (2016) Feature extraction and modelling techniques for multilingual speaker recognition: a review. International Journal of Signal and Imaging Systems Engineering 9(2):67–78
Google Scholar
Wang Z, Du Y, Wei K, Han K, Xu X, Wei G, Tong W, Zhu P, Ma J, Wang J, Wang G (2022) Vision, application scenarios, and key technology trends for 6G mobile communications. Science China Information Sciences 65(5):151301
Google Scholar
Huth ME, Boschung RL, Caversaccio MD, Wimmer W, Georgios M (2022) The effect of internet telephony and a cochlear implant accessory on mobile phone speech comprehension in cochlear implant users. European archives of oto-rhino-laryngology 279(12):5547–5554
PubMed PubMed Central Google Scholar
Asfar NA (2022) The implementation of the forensic method using voice recognition technique to analyze voice resemblance towards mobile phone’s voice recorder. International Journal of Forensic Linguistic 3(1):98–104
Google Scholar
Park NI, Lim SH, Byun JS, Kim JH, Lee JW, Chun C, Kim Y, Jeon OY (2023) Forensic authentication method for audio recordings generated by voice recorder application on Samsung Galaxy Watch4 series. J Forensic Sci 68(1):139–153
PubMed Google Scholar
Bonny T, Nassan WA, Baba A (2023) Voice encryption using a unified hyper-chaotic system. Multimedia Tools and Applications 82(1):1067–1085
Google Scholar
Barbier L, Mbuaki A, Simoens S, Declerck P, Vulto AG, Huys I (2022) Regulatory information and guidance on biosimilars and their use across Europe: a call for strengthened one voice messaging. Frontiers in Medicine 9
Hameed AS (2021) Speech compression and encryption based on discrete wavelet transform and chaotic signals. Multimedia Tools and Applications 80(9):13663–13676
ADS Google Scholar
Yang H, Zhen K, Beack S, Kim M (2021) Source-aware neural speech coding for noisy speech compression. In ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, p 706–710
Kleijn WB, Storus A, Chinen M, Denton T, Lim FS, Luebs A, Skoglund J, Yeh H (2021) Generative speech coding with predictive variance regularization. IEEE International Conference on Acoustics, Speech and Signal Processing 6478–6482
Casebeer J, Vale V, Isik U, Valin JM, Giri R, Krishnaswamy A (2021) Enhancing into the codec: Noise robust speech coding with vector-quantized autoencoders. IEEE International Conference on Acoustics, Speech and Signal Processing 711–715
Gupta K, Korse S, Edler B, Fuchs G (2022) A DNN based post-filter to enhance the quality of coded speech in MDCT Domain. IEEE ICASSP 836–840
Ding Y, Yu X (2023) A Hybrid Structure Speech coding scheme based on MELPe and LPCNet. IEEE International Conference on Electrical Engineering, Big Data and Algorithms 809–812
Mustafa A, Büthe J, Korse S, Gupta K, Fuchs G, Pia N (2021) A streamwise GAN vocoder for wideband speech coding at very low bit rate. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 66–70
Hwang S, Lee E, Jang I, Shin JW (2022) Alias-and-Separate: wideband speech coding using sub-Nyquist sampling and speech separation. IEEE Signal Processing Letters 29:2003–2007
ADS Google Scholar
Lotfidereshgi R, Gournay P (2022) Cognitive coding of speech. IEEE ICASSP 7772–7776
Korse S, Gupta K, Fuchs S (2020) Enhancement of coded speech using a mask-based post-filter. IEEE ICASSP 6764–6768
Roccetti M, Ghini V, Pau G, Salomoni P, Bonfigli ME (2001) Design and experimental evaluation of an adaptive playout delay control mechanism for packetized audio for use over the internet. Multimedia Tools and Applications 14:23–53
Google Scholar
Moon S, Kurose J, Towsley D (1998) Packet audio playout delay adjustment: performance bounds and algorithms. Multimedia Systems 6:17–28
Google Scholar
Thimmaraja YG, Nagaraja BG, Jayanna HS (2021) Speech enhancement and encoding by combining SS-VAD and LPC. International Journal of Speech Technology 24:165–172
Google Scholar
Ghinea G, Angelides MC (2004) A user perspective of quality of service in m-commerce. Multimedia Tools and Applications 22:187–206
Google Scholar
Das N, Chakraborty S, Chaki J, Padhy N, Dey N (2021) Fundamentals, present and future perspectives of speech enhancement. International Journal of Speech Technology 24:883–901
Google Scholar
Yadava TG, Nagaraja BG, Jayanna HS (2022) A spatial procedure to spectral subtraction for speech enhancement. Multimedia Tools and Applications 81(17):23633–23647
Google Scholar
Yadava TG, Jayanna HS (2019) Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing. International Journal of Speech Technology 22:639–648
Google Scholar
Cui X, Chen Z, Yin F (2020) Speech enhancement based on simple recurrent unit network. Appl Acoust 157:107019
Google Scholar
Yadava TG, Nagaraja BG, Jayanna HS (2022) Enhancements in continuous Kannada ASR system by background noise elimination. Circuits, Systems, and Signal Processing 41(7):4041–4067
Google Scholar
Yechuri S, Vanambathina S (2023) A nested U-net with efficient channel attention and d3net for speech enhancement. Circuits, Systems, and Signal Processing 1–21
Bie X, Leglaive S, Alameda-Pineda X, Girin L (2022) Unsupervised speech enhancement using dynamical variational autoencoders. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30:2993–3007
Google Scholar
Casebeer J, Vale V, Isik U, Valin JM, Giri R, Krishnaswamy A (2021) Enhancing into the codec: Noise robust speech coding with vector-quantized autoencoders. IEEE ICASSP 711–715
Rezki M, Ayad M (2022) A synthetic sleep snoring study through the use of linear predictive speech techniques. In 2022 19th International Multi-Conference on Systems, Signals & Devices, p 896–899
Nagaraja BG, Jayanna HS (2012) Multilingual speaker identification with the constraint of limited data using multitaper MFCC. Proc. International Conference on Recent Trends in Computer Networks and Distributed Systems Security 127–134
Bhatia S, Kumar A, Reddy T, Varshney N, Basheer S (2023) Matrix quantization and LPC vocoder based linear predictive for low-resource speech recognition system. ACM Transactions on Asian and Low-Resource Language Information Processing
Sankar MA, Sathidevi PS (2023) A wideband scalable bit rate mixed excitation linear prediction-enhanced speech coder by preserving speaker-specific features. Circuits, Systems, Signal Processing 1–27
Al-Heeti MM, Hammad JA, Mustafa AS (2022) Voice encoding for wireless communication based on LPC, RPE, CELP, International Congress on Human-Computer Interaction. Optimization and Robotic Applications 1–4
Wang L, Belina J, Vasinonta A, Berner M, Ramprashad S (1994) Compression of ECG using a code excited linear prediction (CELP). International Conference of the IEEE Engineering in Medicine and Biology Society 2:1264–1265
Google Scholar
Zaki FW (1991) Sequentially adaptive differential pulse code modulation using adaptive LSP filters. MEJ, Mansoura Engineering Journal 16(2):1–18
Google Scholar
He Y (2021) Exploring adaptive differential pulse-code modulation towards resource-efficient full-spectrum wireless neural recording (Doctoral dissertation, State University of New York at Binghamton)
Sadeeq MA, Abdulazeez AM (2020) Neural networks architectures design, and applications: A review. In 2020 International Conference on Advanced Science and Engineering p 199–204
Alam M, Samad MD, Vidyaratne L, Glandon A, Iftekharuddin KM (2020) Survey on deep neural networks in speech and vision systems. Neurocomputing 417:302–321
CAS PubMed PubMed Central Google Scholar
Siniscalchi SM, Svendsen T, Lee CH, CH, (2014) An artificial neural network approach to automatic speech processing. Neurocomputing 140:326–338
Chen Y, Mukherjee D, Han J, Grange A, Xu Y, Parker S, Chen C, Su H, Joshi U, Chiang CH, Wang Y (2020) An overview of coding tools in AV1: The first video codec from the alliance for open media. APSIPA Transactions on Signal and Information Processing 9:e6
Google Scholar
Moriya T, Honda M (1987) Transform coding of speech with weighted vector quantization. IEEE ICASSP’87 1629–1632
Shlomot E, Cuperman V, Gersho A (1997) Hybrid coding of speech at 4 kbps, IEEE Workshop on Speech Coding for Telecommunications Proceedings. Attacking Fundamental Problems in Speech Coding, Back to Basics, pp 37–38
Google Scholar
Shlomot E, Cuperman V, Gersho A (1998) Combined harmonic and waveform coding of speech at low bit rates. IEEE ICASSP ’98 (Cat. No.98CH36181) 2:585–588
Klejsa J, Hedelin P, Zhou C, Fejgin R, Villemoes L (2019) High-quality speech coding with sample RNN. In ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, p 7155–7159
Hu X, Li G, Xia X, Lo D, Jin Z (2020) Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering 25:2179–2217
Google Scholar
Bhangale KB, Mohanaprasad K (2021) A review on speech processing using machine learning paradigm. International Journal of Speech Technology 24:367–388
Google Scholar
Arias-Vergara T, Klumpp P, Vasquez-Correa JC, Nöth E, Orozco-Arroyave JR, Schuster M (2021) Multi-channel spectrograms for speech processing applications using deep learning methods. Pattern Anal Applic 24:423–431
Google Scholar
Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. IEEE international conference on acoustics, speech, and signal processing, Proceedings (Cat. No. 01CH37221) 2:749–752
Streijl RC, Winkler S, Hands DS (2016) Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives. Multimedia Systems 22(2):213–227
Google Scholar
Chen F, Hu YI (2013) Modifying the normalized covariance metric measure to account ratio 54:503–515
Google Scholar
Saleem N, Khattak MI, Nawaz A, Umer F, Ochani MK (2021) Perceptually weighted \(\beta \)-order spectral amplitude Bayesian estimator for phase compensated speech enhancement. Applied Acoustics 178:108007
Google Scholar
Hedelin P, Nordén F, Skoglund J (1999) SD optimization of spectral coders. IEEE Workshop on Speech Coding Proceedings, Model, Coders, Error Criteria (Cat. No. 99EX351) 28–30
Zue V, Seneff S, Glass J (1990) Speech database development at MIT: TIMIT and beyond. Speech communication 9(4):351–356
Google Scholar
Sharma S, Ellis D, Kajarekar S, Jain P, Hermansky H (2000) Feature extraction using non-linear transformation for robust speech recognition on the Aurora database. IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (Cat. No. 00CH37100) 2:II1117–II1120
Hu Y, Loizou P (2008) Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Speech and Audio Processing 16(1):229–238
Google Scholar
Ma J, Hu Y, Loizou P (2009) Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. J Acoust Soc Am 125(5):3387–3405
ADS PubMed PubMed Central Google Scholar
Veaux C, Yamagishi J, King S (2013) The voice bank corpus: Design, collection and data analysis of a large regional accent speech database 6709856. https://doi.org/10.1109/ICSDA
Robinson T, Fransen J, Pye D, Foote J, Renals S (1995) WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition. International Conference on Acoustics, Speech, and Signal Processing 1:81–84
Google Scholar
Elenius K, Lindberg J (1997) SpeechDat - speech databases for creation of voice driven teleservices 4:61–64
Google Scholar
Hirsch HG, Pearce D (2000) The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ASR2000-Automatic speech recognition: challenges for the new Millenium ISCA tutorial and research workshop
Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. IEEE international conference on acoustics, speech and signal processing 5206–5210
Du J, Tu YH, Sun L, Ma F, Wang HK, Pan J, Liu C, Chen JD, Lee CH (2016) The USTC-iFlytek system for CHiME-4 challenge. Proc. CHiME 4:36–38
Google Scholar
Chen SJ, Xia W, Hansen JH (2021) Scenario aware speech recognition: Advancements for apollo fearless steps & CHiME-4 corpora. IEEE Automatic Speech Recognition and Understanding Workshop 289–295
Zamyatnin AA, Borchikov AS, Vladimirov MG, Voronina OL (2006) The EROP-Moscow oligopeptide database. Nucleic Acids Res 34(suppl_1):D261–D266

Download references

Author information

Authors and Affiliations

E &CE, Vidyavardhaka College of Engineering, Gokulam 3 stage, Mysuru, 570002, Karnataka, India
Nagaraja B G & Mohamed Anees
E &CE, Nitte Meenakshi Institute of Technology, Yelahanka, Bengaluru, 560064, Karnataka, India
Thimmaraja Yadava G

Authors

Nagaraja B G
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Anees
View author publications
You can also search for this author in PubMed Google Scholar
Thimmaraja Yadava G
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thimmaraja Yadava G.

Ethics declarations

Conflicts of interest

The authors have no conflict of interests on the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Mohamed Anees and Thimmaraja Yadava G contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

G, N.B., Anees, M. & G, T.Y. Speech coding techniques and challenges: a comprehensive literature survey. Multimed Tools Appl 83, 29859–29879 (2024). https://doi.org/10.1007/s11042-023-16665-3

Download citation

Received: 17 March 2023
Revised: 07 July 2023
Accepted: 23 August 2023
Published: 14 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16665-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Speech coding techniques and challenges: a comprehensive literature survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Machine Learning for Speech Recognition

Challenges in Speech Coding Research

A Comprehensive Analysis of Speech Recognition Systems in Healthcare: Current Research Challenges and Future Prospects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Speech coding techniques and challenges: a comprehensive literature survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Machine Learning for Speech Recognition

Challenges in Speech Coding Research

A Comprehensive Analysis of Speech Recognition Systems in Healthcare: Current Research Challenges and Future Prospects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation