RGF-Bot: A Novel Feature Selection Method to Identify Malicious Bot Accounts on Social Networking Sites Using Machine Learning | SN Computer Science Skip to main content
Log in

RGF-Bot: A Novel Feature Selection Method to Identify Malicious Bot Accounts on Social Networking Sites Using Machine Learning

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

A bot is an automated code used for malicious activities such as posting fake news, spreading malware, commenting on tweets, and liking the tweets, on Social Networking Sites (SNS) like Twitter. This paper proposes a novel feature selection method using machine learning to identify malicious bot accounts on social networking sites. This would help identify bot SNS accounts with minimal features yet maintain the same or higher accuracy. At the initial stage, the standard datasets from the Twitter platform were downloaded and pre-processed. Dataset 1, with 29 features and Dataset 2 with 30 features, were considered. The existing feature selection methods such as Variance Score (VS), Random Forest Importance (RFI), and Gradient Boost Importance (GBI) were applied to rank the features. Later, the proposed Recursive Grouping of Features (RGF) method is applied to VS, RFI, and GBI ranked feature sets to obtain the Minimal Features Sets (MFS)s in which the number of features is less than the total number of features. All classification algorithms were applied on VS, RFI, and GBI ranked MFSs to find the best-performing classifier and best feature ranking method. As a result, Decision trees were found to be the best classification algorithm on VS ranked MFSs. The proposed RGF method with the first MFS alone achieved the same accuracy on Dataset 1 and improved accuracy on Dataset 2 compared to all features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability Statement

Not applicable.

Code Availability

Not applicable.

References

  1. Brian D. How many people use twitter in 2021? https://backlinko.com/twitter-users.

  2. Alothali E, Zaki N, Mohamed EA, Alashwal H. Detecting social bots on twitter: a literature review. In: Proceedings of the 2018 13th international conference on innovations in information technology, IIT 2018. 2019. pp. 175–80. https://doi.org/10.1109/INNOVATIONS.2018.8605995.

  3. Kaggle bot detection on Tweets | Kaggle. https://www.kaggle.com/c/bot-detection-on-tweets/data.

  4. Yang K. Bot repository. https://botometer.osome.iu.edu/bot-repository/datasets.html.

  5. Fonseca Abreu JV, Ghedini Ralha C, Costa Gondim JJ. Twitter bot detection with reduced feature set. In: Proceedings—2020 IEEE international conference on intelligence and security informatics, ISI 2020. 2020. pp. 1–6. https://doi.org/10.1109/ISI49825.2020.9280525.

  6. David I, Siordia OS, Moctezuma D, Features combination for the detection of malicious Twitter accounts. In: IEEE international autumn meeting on power. Electronics and computing, ROPEC. 2016, vol. 2016, no. 2017. pp. 1–6. https://doi.org/10.1109/ROPEC.2016.7830626.

  7. Rostami RR, Karbasi S. Detecting fake accounts on twitter social network using multi-objective hybrid feature selection approach. Webology. 2020;17(1):1–18. https://doi.org/10.14704/WEB/V17I1/A204.

  8. Khalil H, Khan MUS, Ali M. Feature selection for unsupervised bot detection. In: 2020 3rd international conference on computing, mathematics and engineering technologies: idea to innovation for building the knowledge economy, iCoMET 2020. 2020. pp. 1–7. https://doi.org/10.1109/iCoMET48670.2020.9074131.

  9. Fernquist J, Kaati L, Schroeder R. Political bots and the Swedish general election. In,. IEEE international conference on intelligence and security informatics (ISI). IEEE. 2018. 2018. pp. 124–9.

  10. Chu Z, Gianvecchio S, Wang H, Jajodia S. Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans Depend Secure Comput. 2012;9(6):811–24.

    Article  Google Scholar 

  11. Efthimion PG, Payne S, Proferes N. Supervised machine learning bot detection techniques to identify social twitter bots. SMU Data Sci Rev. 2018;1(2):5.

    Google Scholar 

  12. Heidari M, James H Jr, Uzuner O, An empirical study of machine learning algorithms for social media bot detection. In: IEEE international IOT, electronics and mechatronics conference (IEMTRONICS). IEEE. 2021. 2021. pp. 1–5. arXiv:24567.

  13. Gera S, Sinha A. T-Bot: AI-based social media bot detection model for trend-centric twitter network. Social Netw Anal Min. 2022;12(1):1–19.

    Article  Google Scholar 

  14. Hayawi K, Mathew S, Venugopal N, Masud MM, Ho PH. DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data. Soc Netw Anal Min. 2022;12(1):1–19.

    Article  Google Scholar 

  15. Chavoshi N, Hamooni H, Mueen A. Debot: Twitter bot detection via warped correlation. In: Icdm, vol. 18. 2016. pp. 28–65.

  16. Shukla H, Jagtap N, Patil B. Enhanced Twitter bot detection using ensemble machine learning. In: Sixth international conference on inventive computation technologies [ICICT 2021]. IEEE; 2021. p. 930–6.

  17. Anwar A, Yaqub U. Bot detection in twitter landscape using unsupervised learning. In: The 21st annual international conference on digital government research; 2020. pp. 329–30.

  18. Zuccarelli E. Performance metrics in ML. https://towardsdatascience.com/performance-metrics-in-machine-learning-part-1-classification-6c6b8d8a8c92.

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Chanti.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Consent to participate

The authors declare no consent to participate through Virtual mode.

Consent for publication

The authors declare no consent for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Internet Research and Engineering 2023” guest edited by Sudarsan S D, Mohit Sethi and Balaji Rajendran.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chanti, S., Chithralekha, T. RGF-Bot: A Novel Feature Selection Method to Identify Malicious Bot Accounts on Social Networking Sites Using Machine Learning. SN COMPUT. SCI. 4, 843 (2023). https://doi.org/10.1007/s42979-023-02263-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02263-5

Keywords