Abstract
Plagiarism is specifically defined as literary theft of paragraphs or sentences from unreferenced source. This unauthorized behavior is a real problem that targets scientific research scope. This paper proposes a Hybrid Arabic Plagiarism Detection System (HYPLAG). The HYPLAG approach combines corpus-based and knowledge-based approaches by utilizing an Arabic semantic resource (Arabic WordNet). A preliminary study on texts from undergraduate students was conducted to understand their behavior and the patterns used in plagiarism. The results of the study show that students apply different techniques to plagiarized sentences, also it shows changes in sentence’s components (verbs, nouns, and adjectives). HYPLAG was evaluated on the ExAraPlagDet-2015 dataset against several other approaches that participated in the AraPlagDet PAN@FIRE shared task on Extrinsic Arabic plagiarism detection obtaining a higher performance (F-score 89% vs. 84% obtained by the best performing system at AraPlagDet) with less computational time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
http://www.plagiarism.org/resources/facts-and-stats/; accessed on October 2016.
- 2.
http://www.checkforplagiarism.net/cyber-plagiarism; accessed on October 2016.
- 3.
https://infogr.am/Plagiarism-606324; accessed on October 2016.
- 4.
We used Farasa NER tool for named entity recognition, http://qatsdemo.cloudapp.net/farasa/; accessed on December 2016.
- 5.
http://globalwordnet.org/arabic-wordnet/; accessed on November 2016.
- 6.
http://misc-umc.org/AraPlagDet/?i=1; accessed on September 2016.
References
Magooda, A., Mahgoub, A.Y., Rashwan, M., Fayek, M.B., Raafat, H.: RDI system for extrinsic plagiarism detection (RDI_RED), working notes for PANAraPlagDet at FIRE 2015. In: FIRE Workshops, pp. 126–128 (2015)
Khan, I.H., Siddiqui, M.A., Mansoor, K.: A framework for plagiarism detection in Arabic documents (2015)
Jadalla, A., Elnagar, A.: A plagiarism detection system for Arabic text-based documents. In: Chau, M., Wang, G.A., Yue, W.T., Chen, H. (eds.) PAISI 2012. LNCS, vol. 7299, pp. 145–153. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30428-6_12
Farahat, F.F., Asem, A.S., Zaher, M.A., Fahiem, A.M.: Detecting plagiarism in Arabic E-Learning using text mining. Br. J. Math. Comput. Sci. 8(4), 298–308 (2015)
Hussein, A.S.: A plagiarism detection system for Arabic documents. In: Filev, D., Jabłkowski, J., Kacprzyk, J., Krawczak, M., Popchev, I., Rutkowski, L., Sgurev, V., Sotirova, E., Szynkarczyk, P., Zadrozny, S. (eds.) Intelligent Systems’2014. AISC, vol. 323, pp. 541–552. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-11310-4_47
Yousef, A.A., Aziz, M.J.: Enhanced Tf-Idf weighting scheme for plagiarism detection model for Arabic language. Aust. J. Basic Appl. Sci. 9(23), 90–96 (2015)
Alzahrani, S.: Arabic plagiarism detection using word correlation in N-Grams with K-overlapping approach, working notes for PAN-AraPlagDet at FIRE 2015. In: FIRE Workshops (2015)
Alzahrani, S., Salim, N.: Statement-based fuzzy-set information retrieval versus fingerprints matching for plagiarism detection in Arabic documents. In: 5th Postgraduate Annual Research Seminar (PARS 2009), pp. 267–268 (2009)
Menai, M.E.B.: Detection of plagiarism in Arabic documents. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 4(10), 80 (2012)
Saad, M.K., Ashour, W.: Arabic morphological tools for text mining. In: 6th ArchEng International Symposiums, EEECS 2010, The 6th International Symposium on Electrical and Electronics Engineering and Computer Science, p. 19. European University of Lefke, Cyprus (2010)
Zhang, Y., Li, C., Barzilay, R., Darwish, K.: Randomized greedy inference for joint segmentation, POS tagging and dependency parsing. In: HLT-NAACL, pp. 42–52 (2015)
Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)
Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in WordNet. Int. J. Hybrid Inf. Technol. 6(1), 1–12 (2013)
Tversky, A.: Features of similarity. Psychol. Rev. 84(4), 327 (1977)
Pirró, G., Euzenat, J.: A feature and information theoretic framework for semantic similarity and relatedness. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010. LNCS, vol. 6496, pp. 615–630. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17746-0_39
Bensalem, I., Boukhalfa, I., Rosso, P., Abouenour, L., Darwish, K., Chikhi, S.: Overview of the AraPlagDet PAN@ FIRE2015 shared task on Arabic plagiarism detection. In: FIRE Workshops, pp. 111–122 (2015)
Acknowledgment
The work of Paolo Rosso was funded by the SomEMBED TIN2015-71147-C2-1-P MINECO research project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Ghanem, B., Arafeh, L., Rosso, P., Sánchez-Vega, F. (2018). HYPLAG: Hybrid Arabic Text Plagiarism Detection System. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-91947-8_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)