Abstract
Information Technology has enabled information in many forms such as text, image or sound, to be accessed widely using search terms via a computer. Due to this type of popularity and advancement in technology, there is an increase interest in searching Malay text to enable scholars and researchers to access databases on-line. Malay texts are scanned are stored in databases ready to be used for text retrieval systems that employ conflation methods to identify word variants from these databases. This paper evaluates the retrieval effectiveness of conflation methods; namely stemming and thesaurus to search and retrieve relevant Malay translated Al-Quran documents based on user natural query words. The Malay Translated Al-Quran texts are stored in an inverted file structure. The retrieved documents are weighted and ranked using Inverse Document Frequency (idf) function. The retrieval effectiveness (E) is measured using standard recall (R) and precision (P). Experiments performed on the Malay Translated Al-Quran documents show that combined search of stemming and thesaurus improve retrieval effectiveness (E) and recall (R) but decrease its precision (P).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altar, S.: Information systems: a management perspective, 2nd edn. The Benjamin/Cummings Publishing, Inc., Menlo Park (1996)
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Ekmekcioglu, F.C., Lynch, M.F., Robertson, A.M., Sembok, T.M.T., Willett, P.: Comparison of N-gram Matching and Stemming for Term Conflation in English, Malay, and Turkish Texts. The Journal of Computer Text Processing 6(1), 1–14 (1996)
Lennon, M., Peirce, D.S., Tarry, B.D., Willett, P.: An Evaluation of Some Conflation Algorithms for Information Retrieval. Journal of Information Science 3, 177–183 (1981)
Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 131–160. Prentice Hall, Englewood Cliffs (1992b)
Freund, G.E., Willett, P.: Online Identification of Word Variants and Arbitrary Truncation Searching Using a String Similarity Measure. Information Technology Research and Development 1, 177–187 (1982)
Popovic, M., Willett, P.: The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data. Journal of the American Society for Information Science 43(5), 384–390 (1992)
Frakes, W.B.: Term Conflation for Information Retrieval. In: van Rijsbergen, C.J. (ed.) Research and Development in Information Retrieval, pp. 383–390. CUP, Cambridge (1984)
Hafer, M.A., Weiss, S.F.: Word Segmentation by Letter Successor Varieties. Information Storage and Retrieval 10, 371–385 (1974)
Harman, D.: How Effective is Suffixing? Journal of the American Society for Information Society for Information Science 42(1), 7–15 (1991)
Lovins, J.B.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)
Niedermair, G.T., Thurmair, G., Buttel, I.: MARS A Retrieval Tool on the Basis of Morphological Analysis. In: van Rijsbergen, C.J. (ed.) Research and Development in Information Retrieval, pp. 369–380. CUP, Cambridge (1985)
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
Ulmschneider, J.E., Doszkocs, T.: A Practical Stemming Algorithm for Online Search Assistance. Online Review 7, 301–318 (1983)
Walker, S., Jones, R.M.: Improving Subject Retrieval in Online Cataloques. 1. Stemming, Automatic Spelling Correction and Cross-Reference Tables. British Library Research Paper 24, London (1987)
Ahmad, F.: A Malay Language Document Retrieval System An Experimental Approach And Analysis. Ph.D. Thesis. Universiti Kebangsaan Malaysia (1995)
Savoy, J.: Stemming of French Words based on Grammatical Categories. Journal of the American Society for Information Science 44(1), 1–9 (1993)
Sembok, T.M.T., Yussoff, M., Ahmad, F.: A Malay Stemming Algorithm for Information Retrieval. In: Proceedings of the 4th International Conference and Exhibition on Multi-lingual Computing, pp. 5.1.2.1–5.1.2.10 (1994)
Al-Kharashi, I.A., Evens, M.W.: Comparing Words, Stems and Roots as Index Terms in an Arabic Information Retrieval System. Journal of the American Society for Information Science 45(8), 548–560 (1994)
Sembok, T.M.T., Willett, P.: Experiments with N-gram String-Similarity Measure on Malay Texts. Technical Report, Universiti Kebangsaan Malaysia (1995)
Abu Bakar, Z., Sembok, T.M.T., Yussoff, M.: Kajian Keberkesanan Algoritma Gabungan Dalam Capaian Maklumat atas Dokumen Melayu. In: Prosiding Simposium Kebangsaan Sains Matematik, vol. 7, pp. 260–266 (1996)
Abu Bakar, Z., Sembok, T.M.T., Yusoff, M.: Experiment on Conflation Algorithms on Malay Texts for Document Retrieval. In: Proceedings of the 15th IASTED International Conference, pp. 229–231 (1997)
Abu Bakar, Z.: Evaluation Of Retrieval Efectiveness Of Conflation Methods On Malay Documents. Ph.D. Thesis, Universiti Kebangsaan Malaysia (1999)
Srinivasan, P.: Thesaurus Construction. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 161–175. Prentice Hall, Eaglewood Cliffs (1992)
Rapizal, A.T.: To Improve Malay Document Retrieval System Using Thesaurus Approach Base On User Query. B.Sc. Thesis. Universiti Teknologi MARA (2000)
Frakes, W.B.: Introduction to Information Storage and Retrieval Systems. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval:Data Structures & Algorithms, pp. 1–12. Prentice Hall, Englewood Cliffs (1992a)
Robertson, S.E.: The Methodology of Information Retrieval Experiment. In: Sparck Jones, K. (ed.) Information Retrieval Experiment, pp. 9–13. Butterworths, London (1981)
Hamidy, H.Z., Fachruddin, H.S.: Tafsir Quran. Translation. Klang Book Centre, Klang (1987)
Popovic, M.: Implementation of a Slovene Language-Based Free-Text Retrieval System. PhD. Thesis. University of Sheffield (1991)
Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
Tague, J.M.: The pragmatics of information retrieval experimentation. In: Sparck Jones, K. (ed.) Information Retrieval Experiment, pp. 59–102. Butterworths, London (1981)
Mokhtar, M.R.: Incorporating Stemming Algorithms in the Malay Information Retrieval that Employs Thesaurus Approach. B.Sc. Thesis. Universiti Teknologi MARA (2001)
Abas, M.Z.M.: Image and Translated Al-Quran Verses Retrieval System Using Thesaurus Approach Base on Malay Query Words. B.Sc. Thesis. Universiti Teknologi MARA (2001)
Abdullah, Ainon: Tesaurus Bahasa Melayu. Utusan Publication Sdn Bhd, Kuala Lumpur (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bakar, Z.A., Rahman, N.A. (2003). Evaluating the Effectiveness of Thesaurus and Stemming Methods in Retrieving Malay Translated Al-Quran Documents. In: Sembok, T.M.T., Zaman, H.B., Chen, H., Urs, S.R., Myaeng, SH. (eds) Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access. ICADL 2003. Lecture Notes in Computer Science, vol 2911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24594-0_67
Download citation
DOI: https://doi.org/10.1007/978-3-540-24594-0_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20608-8
Online ISBN: 978-3-540-24594-0
eBook Packages: Springer Book Archive