Acoustic-phonetic feature based Kannada dialect identification from vowel sounds | International Journal of Speech Technology Skip to main content
Log in

Acoustic-phonetic feature based Kannada dialect identification from vowel sounds

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, a dialect identification system is proposed for Kannada language using vowels sounds. Dialectal cues are characterized through acoustic parameters such as formant frequencies (F1–F3), and prosodic features [energy, pitch (F0), and duration]. For this purpose, a vowel dataset is collected from native speakers of Kannada belonging to different dialectal regions. Global features representing frame level global statistics such as mean, minimum, maximum, standard deviation and variance are extracted from vowel sounds. Local features representing temporal dynamic properties from the contour level are derived from the steady-state vowel region. Three decision tree-based ensemble algorithms, namely random forest, extreme random forest (ERF) and extreme gradient boosting algorithms are used for classification. Performance of both global and local features is evaluated individually. Further, the significance of every feature in dialect discrimination is analyzed using single factor-ANOVA (analysis of variances) tests. Global features with ERF ensemble model has shown a better average dialect identification performance of around 76%. Also, the contribution of every feature in dialect identification is verified. The role of duration, energy, pitch, and three formant features is found to be evidential in Kannada dialect classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Adank, P., Van Hout, R., & Smits, R. (2004). An acoustic description of the vowels of Northern and Southern Standard Dutch. The Journal of the Acoustical society of America, 116(3), 1729–1738.

    Article  Google Scholar 

  • Agrawal, S. S., Jain, A., & Sinha, S. (2016). Analysis and modeling of acoustic information for automatic dialect classification. International Journal of Speech Technology, 19(3), 593–609.

    Article  Google Scholar 

  • Ajmera, J., McCowan, I., & Bourlard, H. (2003). Speech/music segmentation using entropy and dynamism features in a hmm classification framework. Speech Communication, 40(3), 351–363.

    Article  Google Scholar 

  • Arslan, L. M., & Hansen, J. H. L. (1996). Language accent classification in American English. Speech Communication, 18(4), 353–367.

    Article  Google Scholar 

  • Behravan, H., Hautamäki, V., & Kinnunen, T. (2015). Factors affecting i-vector based foreign accent recognition: A case study in spoken Finnish. Speech Communication, 66, 118–129.

    Article  Google Scholar 

  • Biadsy, F., Hirschberg, J., & Ellis, D. P. W. (2011). Dialect and accent recognition using phonetic-segmentation supervectors. In Twelfth annual conference of the international speech communication association.

  • Biadsy, F., Hirschberg, J., & Habash, N. (2009). Spoken Arabic dialect identification using phonotactic modeling. In Proceedings of the workshop on computational approaches to semitic languages conducted by Association for Computational Linguistics (pp. 53–61).

  • Biadsy, F., & Hirschberg, J. (2009). Using prosody and phonotactics in arabic dialect identification. INTERSPEECH, 9, 208–211.

    Google Scholar 

  • Boersma, P., Weenink, D., & Petrus, G. (2002). Praat, a system for doing phonetics by computer. Glot International, 5, 341–345.

    Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Brown, G. (2015). Moving towards automatic accent recognition for forensic applications. Interspeech Doctoral Consortium.

  • Chambers, J. K., & Trudgill, P. (1998). Dialectology (2nd ed.). Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd international conference on knowledge discovery and data mining (pp. 785–794).

  • Chen, T., Huang, C., Chang, E., & Wang, J. (2001). Automatic accent identification using Gaussian mixture models. In IEEE workshop on automatic speech recognition and understanding (pp. 343–346).

  • Chen, N. F, Shen, W., & Campbell, J. P. (2010). A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models. In IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 5014–5017)

  • Chen, N. F., Tam, S. W., Shen, W., & Campbell, J. P. (2014). Characterizing phonetic transformations and acoustic differences across English dialects. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1), 110–124.

    Article  Google Scholar 

  • Chittaragi, N. B., & Koolagudi, S. G. (2017). Acoustic features based word level dialect classification using SVM and ensemble methods. In Tenth international conference on contemporary computing (IC3) (pp. 1–6).

  • Chittaragi, N. B, Limaye, A., Chandana, N. T., Annappa, B., & Koolagudi, S. G. (2019). Automatic text-independent kannada dialect identification system. In Information Systems Design and Intelligent Applications (pp. 79–87). Springer, Berlin.

  • Chittaragi, N. B., Prakash, A., & Koolagudi, S. G. (2018). Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arabian Journal for Science and Engineering, 43(8), 4289–4302.

    Article  Google Scholar 

  • Clopper, C. G., Pisoni, D. B., & De Jong, K. (2005). Acoustic characteristics of the vowel systems of six regional varieties of American English. The Journal of the Acoustical Society of America, 118(3), 1661–1676.

    Article  Google Scholar 

  • Darwish, K., Sajjad, H., & Mubarak, H. (2014). Verifiably effective Arabic dialect identification. In Empirical methods in natural language processing (pp. 1465–1468).

  • Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D. A., & Dehak, R. (2011). Language recognition via i-vectors and dimensionality reduction. In Interspeech (pp. 857–860).

  • Dietterich, T. G. (2000). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1–15). Springer, Berlin.

    Google Scholar 

  • Escudero, P., Boersma, P., Rauber, A. S., & Bion, R. A. H. (2009). A cross-dialect acoustic description of vowels: Brazilian and european portuguese. The Journal of the Acoustical Society of America, 126(3), 1379–1393.

    Article  Google Scholar 

  • Fogerty, D., & Humes, L. E. (2012). The role of vowel and consonant fundamental frequency, envelope, and temporal fine structure cues to the intelligibility of words and sentences. The Journal of the Acoustical Society of America, 131(2), 1490–1501.

    Article  Google Scholar 

  • Freund, Y., & Schapire, R. (1999). A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 14(1612), 771–780.

    Google Scholar 

  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232.

    Article  MathSciNet  Google Scholar 

  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.

    Article  Google Scholar 

  • Giannakopoulos, T., & Pikrakis, A. (2014). Introduction to audio analysis: A MATLAB approach. Orlando: Academic Press.

    Google Scholar 

  • Hansen, J. H. L., & Liu, G. (2016). Unsupervised accent classification for deep data fusion of accent and language information. Speech Communication, 78, 19–33.

    Article  Google Scholar 

  • Harris, M. J., Gries, S. T., & Miglio, V. G. (2014). Prosody and its application to forensic linguistics. LESLI: Linguistic Evidence in Security Law and Intelligence, 2(2), 11–29.

    Article  Google Scholar 

  • Hillenbrand, J. M., Clark, M. J., & Nearey, T. M. (2001). Effects of consonant environment on vowel formant patterns. The Journal of the Acoustical Society of America, 109(2), 748–763.

    Article  Google Scholar 

  • Huang, R., Hansen, J. H. L., & Angkititrakul, P. (2007). Dialect/accent classification using unrestricted audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 453–464.

    Article  Google Scholar 

  • Jain, D., & Cardona, G. (2007). The Indo-Aryan languages. London: Routledge.

    Google Scholar 

  • Johnson, K. (2008). 15 speaker normalization in speech perception. In: The handbook of speech perception (p. 363).

  • Li, H., Ma, B., & Lee, K. A. (2013). Spoken language recognition: From fundamentals to practice. Proceedings of the IEEE, 101(5), 1136–1159.

    Article  Google Scholar 

  • Liu, G. A., & Hansen, J. H. L. (2011). A systematic strategy for robust automatic dialect identification. In IEEE nineteenth European signal processing conference (pp. 2138–2141).

  • McCandless, S. (1974). An algorithm for automatic formant extraction using linear prediction spectra. IEEE Transactions on Acoustics, Speech, and Signal Processing, 22(2), 135–141.

    Article  Google Scholar 

  • Mehrabani, M., & Hansen, J. H. L. (2015). Automatic analysis of dialect/language sets. International Journal of Speech Technology, 18(3), 277–286.

    Article  Google Scholar 

  • Nagesha, K. S., & Nagabhushana, B. (2007). Acoustic-phonetic analysis of Kannada accents. In Proceedings of frontiers of research on speech and music signal processing, AIISH (pp. 222–225).

  • Najafian, M., DeMarco, A., Cox, S., & Russell, M. (2014) . Unsupervised model selection for recognition of regional accented speech. In Fifteenth annual conference of the international speech communication association.

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    MathSciNet  MATH  Google Scholar 

  • Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565.

    Article  Google Scholar 

  • Rabiner, L. R., & Juang, B.-H. (1993). Fundamentals of speech recognition (Vol. 14). Hall Englewood Cliffs: PTR Prentice.

    Google Scholar 

  • Rajapurohit, B. B. (1982). Acoustic characteristics of Kannada. Mysore: Central Institute of Indian Languages.

    Google Scholar 

  • Rao, K. S., & Koolagudi, S. G. (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. International Journal of Systemics, Cybernetics and Informatics, 9(4), 24–33.

    Google Scholar 

  • Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.

    Article  Google Scholar 

  • Reetz, H., & Jongman, A. (2011). Phonetics: Transcription, production, acoustics, and perception (Vol. 34). New York: Wiley.

    Google Scholar 

  • Rouas, J. L. (2007). Automatic prosodic variations modeling for language and dialect discrimination. IEEE Transactions on Audio, Speech and Language Processing, 15(6), 1904–1911.

    Article  Google Scholar 

  • Sarma, M., & Sarma, K. K. (2016). Dialect Identification from Assamese speech using prosodic features and a neuro fuzzy classifier. In Third international conference on signal processing and integrated networks (SPIN) (pp. 127–132).

  • Shridhara, M. V., Banahatti, B. K., Narthan, L., Karjigi, V., & Kumaraswamy, R. (2013). Development of kannada speech corpus for prosodically guided phonetic search engine. In International conference on Asian spoken language research and evaluation (O-COCOSDA/CASLRE) (pp. 1–6). IEEE.

  • Sinha, S., Jain, A., & Agrawal, S. S. (2015). Fusion of multi-stream speech features for dialect classification. CSI Transactions on ICT, 2(4), 243–252.

    Article  Google Scholar 

  • Soorajkumar, R, Girish, G. N., Ramteke, P. B., Joshi, S. S., & Koolagudi, S. G. (2017). Text-independent automatic accent identification system for Kannada language. In Proceedings of the international conference on data engineering and communication technology (pp. 411–418). Springer, Berlin.

  • Sun, X. (2000). A pitch determination algorithm based on subharmonic-to-harmonic ratio. In The sixth international conference of spoken language processing (pp. 676–679).

  • Themistocleous, C. (2017). Dialect classification using vowel acoustic parameters. Speech Communication, 92, 13–22.

    Article  Google Scholar 

  • Ximenes, A. B., Shaw, J. A., & Carignan, C. (2017). A comparison of acoustic and articulatory methods for analyzing vowel differences across dialects: Data from American and Australian English. The Journal of the Acoustical Society of America, 142(1), 363–377.

    Article  Google Scholar 

  • Zheng, D. C., Dyke, D., Berryman, F., & Morgan, C. (2012). A new approach to acoustic analysis of two British regional accents Birmingham and Liverpool accents. International Journal of Speech Technology, 15(2), 77–85.

    Article  Google Scholar 

  • Zhenhao, G. (2015). Improved accent classification combining phonetic vowels with acoustic features. In 8th international congress on image and signal processing (CISP) (pp. 1204–1209).

  • Zissman, M. A., Gleason, T. P., Rekart, D. M., & Losiewicz, B. L. (1996). Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. Acoustics, Speech, and Signal Processing, ICASSP, 2, 777–780.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nagaratna B. Chittaragi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chittaragi, N.B., Koolagudi, S.G. Acoustic-phonetic feature based Kannada dialect identification from vowel sounds. Int J Speech Technol 22, 1099–1113 (2019). https://doi.org/10.1007/s10772-019-09646-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-019-09646-1

Keywords