Abstract
Plagiarism Detection is being one of the challenging tasks in the academic research world to ensure the integrity/authenticity of a document. Currently, many efficient algorithms are available to sufficiently detect plagiarism in a document. Pre-processing of a document typically remains a master key to achieving the maximum stable goal. Although all algorithms, before checking plagiarism, initially perform some sort of pre-processing on documents and convert the document into a particular format like by removing whitespaces and all special characters, etc. In this paper, we focus on two possible techniques, which can be used for plagiarism, which existing plagiarism detection algorithms are omitting. First is replacing the white spaces with a hidden character with white color (background color) between consecutive words so apparently, they seem to be distinct words, but the algorithm/computer will incorrectly consider them as a single word. So even a 100% copied statement would not be identified as plagiarized content. The second is hiding spam text behind images to falsely report the maximum number of words count in a document but as they are hidden so human eye can’t discover them and the algorithm will consider them as some words resulting in less percentile score of the plagiarized document. Our proposed (pre-processing) technique can efficiently handle these two critical problems which result in improved accuracy and authenticity of plagiarism checking algorithms. We have compared the performance of our algorithm considering these critical issues with other state-of-art algorithms (particularly with Turnitin) and our algorithm handles these issues efficiently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
After paper presentation in conference RoVISP2021, the issues mentioned in this paper have been addressed by Turnitin. So, now, Turnitin can identify the illusive text behind the images and illusive text among the words.
References
Ekbal, A., Saha, S., Choudhary, G.: Plagiarism detection in text using vectorspace model. In: 12th International Conference on Hybrid Intelligent Systems, p. 363371 (2012)
Malik, S.R., Gulia, M.: Rabin-Karp algorithm with hashing a string matching tool. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(3), 389–392 (2014)
Saini, A., Bahl, A., Kumari, S., Singh, M.: Plagiarism checker: text mining. Int. J. Comput. Appl. 134 (2016)
Jiffriya, M.A.C., Jahan, A.M.A.C., Ragel, R.G.: plagiarism detection on electronic text based assignments using vector space model. In: Information and Automation for Sustainability (ICIAfS), Colombo, Sri Lanka (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Iqbal, Z., Murtaza, S., Chan, H.Y., Ghori, M.R., Ahmed, N., Ayub, H. (2022). Handling Illusive Text in Document to Improve Accuracy of Plagiarism Detection Algorithm. In: Mahyuddin, N.M., Mat Noor, N.R., Mat Sakim, H.A. (eds) Proceedings of the 11th International Conference on Robotics, Vision, Signal Processing and Power Applications. Lecture Notes in Electrical Engineering, vol 829. Springer, Singapore. https://doi.org/10.1007/978-981-16-8129-5_9
Download citation
DOI: https://doi.org/10.1007/978-981-16-8129-5_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8128-8
Online ISBN: 978-981-16-8129-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)