Abstract
Domain-specific jargon words are lists of words used in formal communication of a particular domain with domain experts and non-domain experts; however, it is difficult to understand by non-experts and society. Experts of an organization use jargon words in scientific and science communication to keep the protocol of the communication within a domain. The domain-specific Amharic jargon words negatively impact people out of the domain experts to understand the main theme of the disseminated content in science communication. We followed a design science research approach to conduct our study. We prepared a knowledge base with a list of domain-specific Amharic Jargon Words and the meaning of the word. Machine learning classifier algorithms are employed for model development with Support Vector Machine, Artificial Neural Network, and Naïve Bayes with TFIDF feature selection that returns a classification accuracy of 96.2%, 95.2%, and 94.7% respectively. The knowledge-based system best performs when a smaller number of test sentences are entered into the system. For the input of 20, 40, 60, and 80 test sentences, an accuracy of 88.2%, 86.7%, 85.4%, and 83.1% is observed. So that with the hybrid of machine learning and knowledge-based, identification of domain-specific Amharic jargon words is performed. Therefore, we observed promised result with the hybrid of machine learning and knowledge base for the identification of jargon words in jargony text.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sparck Jones, K.: Natural language processing: a historical review. In: Zampolli, A., Calzolari, N., Palmer, M. (eds.) Current Issues in Computational Linguistics: In Honour of Don Walker, pp. 3–16. Springer Netherlands, Dordrecht (1994). https://doi.org/10.1007/978-0-585-35958-8_1
Kevitt, P.M., Partridge, D., Wilks, Y.: Approaches to natural language discourse processing. Artif. Intell. Rev. 6(4), 333–364 (1992). https://doi.org/10.1007/BF00123689
Burns, T.W., O’Connor, D.J., Stocklmayer, S.M.: Science communication: a contemporary definition. Public Underst. Sci. 12(2), 183–202 (2003). https://doi.org/10.1177/09636625030122004
Rakedzon, T., Segev, E., Chapnik, N., Yosef, R., Baram-Tsabari, A.: Automatic jargon identifier for scientists engaging with the public and science communication educators. PLoS One 12(8), 1–13 (2017). https://doi.org/10.1371/journal.pone.0181742
Helmreich, S., Llevadias Jané, J., Farwell, D.: Identifying jargon in texts. Identif. Jarg. Texts 35(35), 425–432 (2005)
Ibrahim, M., Gauch, S., Salman, O., Alqahatani, M.: Enriching consumer health vocabulary using enhanced glove word embedding. In: CEUR Workshop Proc., vol. 2619 (2020)
Demeke, M., Ferede, T.: Agricultural Development in Ethiopia : Are There Alternatives to Food Aid? (2014)
Willoughby, S.D., Johnson, K., Sterman, L.: Quantifying scientific jargon. Public Understand. Sci. 29(6), 634–643 (2020). https://doi.org/10.1177/0963662520937436
Weng, W.H., Chung, Y.A., Szolovits, P.: Unsupervised clinical language translation. In: Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 3121–3131 (2019). https://doi.org/10.1145/3292500.3330710
Cyr, A.: Social media: donʼt discount the benefits! Oncol. Times 34(8), 1–3 (2012). https://doi.org/10.1097/01.COT.0000414683.49317.3b
Seyler, D., Liu, W., Wang, X., Zhai, C.: Towards Dark Jargon Interpretation in Underground Forums, pp. 1–8 (2020). Available at: http://arxiv.org/abs/2011.03011
Gong, L., Yang, R., Liu, Q., Dong, Z., Chen, H., Yang, G.: A dictionary-based approach for identifying biomedical concepts. Int. J. Pattern Recognit. Artif. Intell. 31(9), 1–12 (2017). https://doi.org/10.1142/S021800141757004X
Hermawan, R.: Natural language processing with python, vol. 1, no. 1 (2011)
El-Khair, I.A.: Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study (2006, 2017). Available at: http://arxiv.org/abs/1702.01925
Jing, L.P., Huang, H.K., Shi, H.B.: Improved feature selection approach TFIDF in text mining. In: Proc. 2002 Int. Conf. Mach. Learn. Cybern., vol. 2, pp. 944–946 (2002). https://doi.org/10.1109/icmlc.2002.1174522
Dalianis, H.: Evaluation metrics and evaluation. In: Dalianis, H. (ed.) Clinical Text Mining, pp. 45–53. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-78503-5_6
Holts, A., Riquelme, C., Alfaro, R.: Automated text binary classification using machine learning approach. In: Proc. Int. Conf. Chil. Comput. Sci. Soc. SCCC, pp. 212–217 (2010). https://doi.org/10.1109/SCCC.2010.30
Acknowledgment
The routine tasks of this paper are surely granted by the great contribution of agricultural domain experts, erudite, and agrarian society in Ethiopia.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Lake, M., Tegegne, T. (2022). Agricultural Domain-Specific Jargon Words Identification in Amharic Text. In: Berihun, M.L. (eds) Advances of Science and Technology. ICAST 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 411. Springer, Cham. https://doi.org/10.1007/978-3-030-93709-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-93709-6_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93708-9
Online ISBN: 978-3-030-93709-6
eBook Packages: Computer ScienceComputer Science (R0)