Abstract
A bot is an automated code used for malicious activities such as posting fake news, spreading malware, commenting on tweets, and liking the tweets, on Social Networking Sites (SNS) like Twitter. This paper proposes a novel feature selection method using machine learning to identify malicious bot accounts on social networking sites. This would help identify bot SNS accounts with minimal features yet maintain the same or higher accuracy. At the initial stage, the standard datasets from the Twitter platform were downloaded and pre-processed. Dataset 1, with 29 features and Dataset 2 with 30 features, were considered. The existing feature selection methods such as Variance Score (VS), Random Forest Importance (RFI), and Gradient Boost Importance (GBI) were applied to rank the features. Later, the proposed Recursive Grouping of Features (RGF) method is applied to VS, RFI, and GBI ranked feature sets to obtain the Minimal Features Sets (MFS)s in which the number of features is less than the total number of features. All classification algorithms were applied on VS, RFI, and GBI ranked MFSs to find the best-performing classifier and best feature ranking method. As a result, Decision trees were found to be the best classification algorithm on VS ranked MFSs. The proposed RGF method with the first MFS alone achieved the same accuracy on Dataset 1 and improved accuracy on Dataset 2 compared to all features.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability Statement
Not applicable.
Code Availability
Not applicable.
References
Brian D. How many people use twitter in 2021? https://backlinko.com/twitter-users.
Alothali E, Zaki N, Mohamed EA, Alashwal H. Detecting social bots on twitter: a literature review. In: Proceedings of the 2018 13th international conference on innovations in information technology, IIT 2018. 2019. pp. 175–80. https://doi.org/10.1109/INNOVATIONS.2018.8605995.
Kaggle bot detection on Tweets | Kaggle. https://www.kaggle.com/c/bot-detection-on-tweets/data.
Yang K. Bot repository. https://botometer.osome.iu.edu/bot-repository/datasets.html.
Fonseca Abreu JV, Ghedini Ralha C, Costa Gondim JJ. Twitter bot detection with reduced feature set. In: Proceedings—2020 IEEE international conference on intelligence and security informatics, ISI 2020. 2020. pp. 1–6. https://doi.org/10.1109/ISI49825.2020.9280525.
David I, Siordia OS, Moctezuma D, Features combination for the detection of malicious Twitter accounts. In: IEEE international autumn meeting on power. Electronics and computing, ROPEC. 2016, vol. 2016, no. 2017. pp. 1–6. https://doi.org/10.1109/ROPEC.2016.7830626.
Rostami RR, Karbasi S. Detecting fake accounts on twitter social network using multi-objective hybrid feature selection approach. Webology. 2020;17(1):1–18. https://doi.org/10.14704/WEB/V17I1/A204.
Khalil H, Khan MUS, Ali M. Feature selection for unsupervised bot detection. In: 2020 3rd international conference on computing, mathematics and engineering technologies: idea to innovation for building the knowledge economy, iCoMET 2020. 2020. pp. 1–7. https://doi.org/10.1109/iCoMET48670.2020.9074131.
Fernquist J, Kaati L, Schroeder R. Political bots and the Swedish general election. In,. IEEE international conference on intelligence and security informatics (ISI). IEEE. 2018. 2018. pp. 124–9.
Chu Z, Gianvecchio S, Wang H, Jajodia S. Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans Depend Secure Comput. 2012;9(6):811–24.
Efthimion PG, Payne S, Proferes N. Supervised machine learning bot detection techniques to identify social twitter bots. SMU Data Sci Rev. 2018;1(2):5.
Heidari M, James H Jr, Uzuner O, An empirical study of machine learning algorithms for social media bot detection. In: IEEE international IOT, electronics and mechatronics conference (IEMTRONICS). IEEE. 2021. 2021. pp. 1–5. arXiv:24567.
Gera S, Sinha A. T-Bot: AI-based social media bot detection model for trend-centric twitter network. Social Netw Anal Min. 2022;12(1):1–19.
Hayawi K, Mathew S, Venugopal N, Masud MM, Ho PH. DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data. Soc Netw Anal Min. 2022;12(1):1–19.
Chavoshi N, Hamooni H, Mueen A. Debot: Twitter bot detection via warped correlation. In: Icdm, vol. 18. 2016. pp. 28–65.
Shukla H, Jagtap N, Patil B. Enhanced Twitter bot detection using ensemble machine learning. In: Sixth international conference on inventive computation technologies [ICICT 2021]. IEEE; 2021. p. 930–6.
Anwar A, Yaqub U. Bot detection in twitter landscape using unsupervised learning. In: The 21st annual international conference on digital government research; 2020. pp. 329–30.
Zuccarelli E. Performance metrics in ML. https://towardsdatascience.com/performance-metrics-in-machine-learning-part-1-classification-6c6b8d8a8c92.
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical approval
Not applicable.
Consent to participate
The authors declare no consent to participate through Virtual mode.
Consent for publication
The authors declare no consent for publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances in Internet Research and Engineering 2023” guest edited by Sudarsan S D, Mohit Sethi and Balaji Rajendran.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chanti, S., Chithralekha, T. RGF-Bot: A Novel Feature Selection Method to Identify Malicious Bot Accounts on Social Networking Sites Using Machine Learning. SN COMPUT. SCI. 4, 843 (2023). https://doi.org/10.1007/s42979-023-02263-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02263-5