Classification of Cybercrime Indicators in Open Social Data | SpringerLink
Skip to main content

Classification of Cybercrime Indicators in Open Social Data

  • Conference paper
  • First Online:
Information Management and Big Data (SIMBig 2020)

Abstract

Posting information on social media platforms is a popular activity through which personal and confidential information can leak into the public domain. Consequently, social media can contain information that provides an indication that an organization has been compromised or suffered a data breach. This paper describes a technique for inferring if an organization has been compromised from information posted on social media. The proposed strategy forms the basis of an alarm system which generates an alert for possible unreported cybercrime incidents. The proposed strategy used two social media cybercrime related datasets that were collected from the Irish and New York regions from financial organizations’ Twitter accounts. The Tweets are labelled as either containing cybercrime indicators or not, and then the cybercrime Tweets were labelled further into crime categories. A deep dense pyramidal Neural Network model is used to classify the Tweets. This approach achieves an AUC of \(~0.85 \pm 0.03\) which outperforms the baseline of deep convolutional neural networks.

This project is funded by IBM and Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aggarwal, A., Rajadesingan, A., Kumaraguru, P.: PhishAri: automatic realtime phishing detection on twitter. In: eCrime Researchers Summit, eCrime, pp. 1–12 (2012)

    Google Scholar 

  2. Alsaedi, N., Burnap, P.: Feature extraction and analysis for identifying disruptive events from social media. In: Proceeding of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1495–1502. ACM Press (2015)

    Google Scholar 

  3. Anti Phishing Working Group (APWG): Phishing Activity Trends Report Q4 2016. Tech. Rep. December, APWG (2016). http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  5. Burnap, P., et al.: Detecting tension in online communities with computational Twitter analysis. Technol. Forecast. Soc. Change 95, 96–108 (2015). https://doi.org/10.1016/j.techfore.2013.04.013

    Article  Google Scholar 

  6. Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 1–15 (2016). https://doi.org/10.1140/epjds/s13688-016-0072-6

    Article  Google Scholar 

  7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  8. Drury, B.M., Lopes, A.D.A., et al.: A comparison of the effect of feature selection and balancing strategies upon the sentiment classification of Portuguese news stories. In: Brazilian Conference on Intelligent Systems, 3th; Encontro Nacional de Inteligência Artificial e Computacional, 11th (2014)

    Google Scholar 

  9. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: International Joint Conference on Neural Networks, pp. 1322–1328 (2008)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on CVPR, pp. 770–778. IEEE, June 2016

    Google Scholar 

  11. Indola, R.P., Ebecken, N.F.F.: On extending f-measure and g-mean metrics to multi-class problems. In: \(6^{th}\) International Conference on Data Mining, Text Mining and Their Business Applications, UK, vol. 35, pp. 25–34 (2005)

    Google Scholar 

  12. Institute, I.: New wave of cyber-attacks on banks (2016). http://resources.infosecinstitute.com/new-wave-of-cyber-attacks-on-banks/#gref

  13. Jurman, G., Riccadonna, S., Furlanello, C.: A comparison of MCC and CEN error measures in multi-class prediction. PLoS ONE 7(8), 1–8 (2012)

    Article  Google Scholar 

  14. Khandpur, R.P., Ji, T., Jan, S., Wang, G., Lu, C.T., Ramakrishnan, N.: Crowdsourcing cybersecurity: cyber attack detection using social media. In: Proceeding of the ACM Conference on Information and Knowledge Management, pp. 1049–1057 (2017)

    Google Scholar 

  15. Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp. 971–980 (2017)

    Google Scholar 

  16. Krizhevsky, A., Hinton, G.: Convolutional deep belief networks on cifar-10. Unpublished Manuscript 40(7), 1–9 (2010)

    Google Scholar 

  17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in NIPS, pp. 1097–1105 (2012)

    Google Scholar 

  18. Lee, J.K., Moon, S.Y., Park, J.H.: CloudRPS: a cloud analysis based enhanced ransomware prevention system. J. Supercomput. 73(7), 1–20 (2016)

    Article  Google Scholar 

  19. Lee, K., Eoff, B.D., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on twitter. Icwsm 2011, 185–192 (2006)

    Google Scholar 

  20. Lee, S., Kim, J.: Warning bird: a near real-time detection system for suspicious URLs in twitter stream. IEEE Trans. Dependable Secure Comput. 10(3), 183–195 (2013)

    Article  Google Scholar 

  21. Levi, M., Doig, A., Gundur, R., Wall, D., Williams, M.: Cyberfraud and the implications for effective risk-based responses: themes from UK research. Crime Law Soc. Change 67(1), 77–96 (2017). https://doi.org/10.1007/s10611-016-9648-0

    Article  Google Scholar 

  22. Marchal, S., Francois, J., State, R., Engel, T.: Phish storm: detecting phishing with streaming analytics. IEEE Trans. Netw. Serv. Manage. 11(4), 458–471 (2014)

    Article  Google Scholar 

  23. Randazzo, M.R., Keeney, M., Kowalski, E.: Insider threat study: illicit cyber activity in the banking and finance sector. Tech. rep., 2018 (2005)

    Google Scholar 

  24. Maurer, M.-E., Höfer, L.: Sophisticated phishers make more spelling mistakes: using URL similarity against phishing. In: Xiang, Y., Lopez, J., Kuo, C.-C.J., Zhou, W. (eds.) CSS 2012. LNCS, vol. 7672, pp. 414–426. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35362-8_31

    Chapter  Google Scholar 

  25. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  26. Procter, R., Vis, F., Voss, A.: Reading the riots on Twitter: methodological innovation for the analysis of big data. Int. J. Soc. Res. Methodol. 16(3), 197–214 (2013). http://www.tandfonline.com/doi/abs/10.1080/13645579.2013.774172

  27. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10 (2012)

    Google Scholar 

  28. Shang, J.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73(January), 220–239 (2017). https://doi.org/10.1016/j.eswa.2016.12.035

    Article  Google Scholar 

  29. Ullah, I., Lane, C., Drury, B., Mellotte, M., Madden, M.: Open social data crime analytics. In: International Workshop on Artificial Intelligence in Security, At IJCAI, Melbourne, Australia, pp. 86–87 (2017)

    Google Scholar 

  30. Ullah, I., Petrosino, A.: About pyramid structure in convolutional neural networks. In: Proceeding of the International Joint Conference on Neural Networks, pp. 1318–1324 (2016)

    Google Scholar 

  31. Wang, B., Zubiaga, A., Liakata, M., Procter, R.: Making the most of tweet-inherent features for social spam detection on twitter. CEUR Workshop Proc. 1395, 10–16 (2015)

    Google Scholar 

  32. Wang, X., Gerber, M.S., Brown, D.E.: Automatic crime prediction using events extracted from twitter posts. In: Yang, S.J., Greenberg, A.M., Endsley, M. (eds.) SBP 2012. LNCS, vol. 7227, pp. 231–238. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29047-3_28

    Chapter  Google Scholar 

  33. Zubiaga, A., Liakata, M., Procter, R., Wong, G., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLoS ONE 11(3), 1–16 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael G. Madden .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ullah, I. et al. (2021). Classification of Cybercrime Indicators in Open Social Data. In: Lossio-Ventura, J.A., Valverde-Rebaza, J.C., Díaz, E., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2020. Communications in Computer and Information Science, vol 1410. Springer, Cham. https://doi.org/10.1007/978-3-030-76228-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-76228-5_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-76227-8

  • Online ISBN: 978-3-030-76228-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics