Abstract
Parser is an efficient and accurate enough to be useful in many natural language processing systems, most notably in machine translation [1]. Previously many sentence parsers are developed for foreign languages such as English, Arabic, etc. as well as for Amharic language from local languages of Ethiopia. However, to the best of the researcher’s knowledge concerned, there is no Afan Oromo sentence parser for simple and complex sentences. Thus, we proposed to develop a sentence parser for Afan Oromo language. Parsing Afan Oromo sentence is needed and a necessary mechanism for other natural language processing applications like machine translation, question answering, knowledge extraction and information retrieval, particularly for Afan Oromo language. Rule-based parser using a top-down chart parsing algorithm for Afan Oromo sentences presented in this paper. Context Free Grammar (CFG) is used to represent the grammar. 500 sentences were prepared for sample corpus and CFG rules are extracted manually from sample tagged corpus. We also developed simple algorithm of a lexicon generator to automatically generate the lexical rules. Python programming language and NLTK are used as an implementation tools for this study. From the total of sample dataset 70% is simple sentence type because of we considered four different types of simple sentences (declaratives, interrogatives, imperatives and exclamatory sentences) and the rest 30% is complex sentence type. The parser was trained on 400 sentences of training dataset with the accuracy of 98.25% and tested on 100 sentences of testing dataset with the accuracy of 91%. The experimental results on a parser is an encouraging result since it is the first work for simple and complex sentences of Afan Oromo language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Katz-Brown, J., et al.: Training a parser for machine translation reordering. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011), pp. 183–192 (2011)
Genemo, A.S.: Afaan Oromo Named Entity Recognition Using Hybrid Approach, M.Sc. thesis, Department of Computer Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2015)
Chomsky, N.: Syntactic Structures, 2nd edn. New York (2002)
Megersa, D.: An automatic sentence parser for Oromo language using supervised learning technique, M.Sc.thesis, Department of Information Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2002)
Mohammed, A.D.: A top-down chart parser for Amharic sentences, M.Sc. thesis, Department of Computer Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2015)
Alemu, A.: Automatic sentence parsing for Amharic text an experiment using probabilistic context free grammars, M.Sc. thesis, Department of Information Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2002)
Agonafer, D.G.: An integrated approach to automatic complex sentence parsing for Amharic text, M.Sc. thesis, Department of Information Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2003)
Ibrahim, A.: A hybrid approach to Amharic base phrase chunking and parsing, M.Sc. thesis, Department of Computer Science, School of Graduate Studies, Addis Ababa University, Addis Ababa (2013)
Sleator, D.D.K.: Parsing English with a Link Grammar, National Science Foundation under grant CCR-8658139, Oline Corporation. R. R. Donnelley and Sons, New York (1991)
Al-Taani, A., Msallam, M., Wedian, S.: A top-down chart parser for analyzing Arabic sentences. Int. Arab. J. Inf. Technol. 9(3), 109–116 (2012)
Khoufi, N., Aloulou, C., Hadrich, L., Anlp, B.: ARSYPAR : a tool for parsing the Arabic language. In: International Arab Conference on Information Technology, ACIT, University of Science & Technology (2013)
Bataineh, B.M., Bataineh, E.A.: An efficient recursive transition network parser for Arabic language. In: Proceedings of the World Congress on Engineering 2009, vol. II, pp. 1307–1311 (2009)
Hambir, N.: Hindi parser-based on CKY algorithm, vol. 3, no. 2, pp. 851–853 (2012)
Thant, W.W., Htwe, T.M., Thein, N.L.: Context free grammar based top-down parsing of Myanmar sentences. International Conference On Information Technology, Pattaya, December 2011, pp. 71–75 (2011)
Lian, H.: Chinese language parsing with maximum-entropy-inspired parser maximum-entropy-inspired parser, M.S. thesis, pp. 1–6 (2005)
Ouersighni, R.: Robust Rule-based Approach in Arabic processing. Int. J. Comput. Appl. 93(12), 31–37 (2014)
Erbach, G.: A flexible parser for a linguistic development environment. In: Herzog, O., Rollinger, C.-R. (eds.) Text Understanding in LILOG. LNCS, vol. 546, pp. 74–87. Springer, Heidelberg (1991). https://doi.org/10.1007/3-540-54594-8_53
Weikum, G.: Foundations of statistical natural language processing. ACM SIGMOD Rec. 31(3), 37 (2002)
Jason: Parsing. https://www.cs.cornell.edu/courses/cs4740/2012sp/lectures/parsing-intro-4pp.pdf. Accessed 03 Feb 2017
Thompson, B.I.: Afro Asiatic Language Family (2017). http://aboutworldlanguages.com/afro-asiatic-language-family. Accessed 04 Feb 2017
Gamta, T.: The Oromo language and the latin alphabet. J. Oromo Stud. 10–13 (1992). http://www.africa.upenn.edu/Hornet/Afaan_Oromo_19777.html. Accessed 06 Feb 2017
Ganfure, G.O., Midekso, D.: Design and implementation of morphology based spell checker, vol. 3, no. 12, pp. 118–125 (2014)
Yimam, B.: The phrase structures of ethiopian oromo, Ph.D. Dissertification, Addis Ababa University (1986)
Alqrainy, S., Jordan, S., Alkoffash, M.S.: Context-free grammar analysis for Arabic sentences. Int. J. Comput. Appl. 53(3), 7–11 (2012)
Kibble, R.: Introduction to natural language processing undergraduate study in computing and related programmes (2013)
Zhu, S.C.: Ch 4 classic parsing algorithms chart parsing in NLP pp. 1–51
Fox, H.J.: Lexicalized, edge-based, best-first chart parsing, M.Sc. thesis, Department of Computer Science, Massachusetts Institute of Technology, Brown University (1999)
Nedjo, A.T., Huang, D., Liu, X.: Automatic part-of-speech tagging for Oromo language using Maximum Entropy Markov Model (MEMM), vol. 10, pp. 3319–3334 (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Balcha, H.B., Tegegne, T. (2019). Design and Development of Sentence Parser for Afan Oromo Language. In: Mekuria, F., Nigussie, E., Tegegne, T. (eds) Information and Communication Technology for Development for Africa. ICT4DA 2019. Communications in Computer and Information Science, vol 1026. Springer, Cham. https://doi.org/10.1007/978-3-030-26630-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-26630-1_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26629-5
Online ISBN: 978-3-030-26630-1
eBook Packages: Computer ScienceComputer Science (R0)