Abstract
Posting information on social media platforms is a popular activity through which personal and confidential information can leak into the public domain. Consequently, social media can contain information that provides an indication that an organization has been compromised or suffered a data breach. This paper describes a technique for inferring if an organization has been compromised from information posted on social media. The proposed strategy forms the basis of an alarm system which generates an alert for possible unreported cybercrime incidents. The proposed strategy used two social media cybercrime related datasets that were collected from the Irish and New York regions from financial organizations’ Twitter accounts. The Tweets are labelled as either containing cybercrime indicators or not, and then the cybercrime Tweets were labelled further into crime categories. A deep dense pyramidal Neural Network model is used to classify the Tweets. This approach achieves an AUC of \(~0.85 \pm 0.03\) which outperforms the baseline of deep convolutional neural networks.
This project is funded by IBM and Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, A., Rajadesingan, A., Kumaraguru, P.: PhishAri: automatic realtime phishing detection on twitter. In: eCrime Researchers Summit, eCrime, pp. 1–12 (2012)
Alsaedi, N., Burnap, P.: Feature extraction and analysis for identifying disruptive events from social media. In: Proceeding of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1495–1502. ACM Press (2015)
Anti Phishing Working Group (APWG): Phishing Activity Trends Report Q4 2016. Tech. Rep. December, APWG (2016). http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Burnap, P., et al.: Detecting tension in online communities with computational Twitter analysis. Technol. Forecast. Soc. Change 95, 96–108 (2015). https://doi.org/10.1016/j.techfore.2013.04.013
Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 1–15 (2016). https://doi.org/10.1140/epjds/s13688-016-0072-6
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Drury, B.M., Lopes, A.D.A., et al.: A comparison of the effect of feature selection and balancing strategies upon the sentiment classification of Portuguese news stories. In: Brazilian Conference on Intelligent Systems, 3th; Encontro Nacional de Inteligência Artificial e Computacional, 11th (2014)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: International Joint Conference on Neural Networks, pp. 1322–1328 (2008)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on CVPR, pp. 770–778. IEEE, June 2016
Indola, R.P., Ebecken, N.F.F.: On extending f-measure and g-mean metrics to multi-class problems. In: \(6^{th}\) International Conference on Data Mining, Text Mining and Their Business Applications, UK, vol. 35, pp. 25–34 (2005)
Institute, I.: New wave of cyber-attacks on banks (2016). http://resources.infosecinstitute.com/new-wave-of-cyber-attacks-on-banks/#gref
Jurman, G., Riccadonna, S., Furlanello, C.: A comparison of MCC and CEN error measures in multi-class prediction. PLoS ONE 7(8), 1–8 (2012)
Khandpur, R.P., Ji, T., Jan, S., Wang, G., Lu, C.T., Ramakrishnan, N.: Crowdsourcing cybersecurity: cyber attack detection using social media. In: Proceeding of the ACM Conference on Information and Knowledge Management, pp. 1049–1057 (2017)
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp. 971–980 (2017)
Krizhevsky, A., Hinton, G.: Convolutional deep belief networks on cifar-10. Unpublished Manuscript 40(7), 1–9 (2010)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in NIPS, pp. 1097–1105 (2012)
Lee, J.K., Moon, S.Y., Park, J.H.: CloudRPS: a cloud analysis based enhanced ransomware prevention system. J. Supercomput. 73(7), 1–20 (2016)
Lee, K., Eoff, B.D., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on twitter. Icwsm 2011, 185–192 (2006)
Lee, S., Kim, J.: Warning bird: a near real-time detection system for suspicious URLs in twitter stream. IEEE Trans. Dependable Secure Comput. 10(3), 183–195 (2013)
Levi, M., Doig, A., Gundur, R., Wall, D., Williams, M.: Cyberfraud and the implications for effective risk-based responses: themes from UK research. Crime Law Soc. Change 67(1), 77–96 (2017). https://doi.org/10.1007/s10611-016-9648-0
Marchal, S., Francois, J., State, R., Engel, T.: Phish storm: detecting phishing with streaming analytics. IEEE Trans. Netw. Serv. Manage. 11(4), 458–471 (2014)
Randazzo, M.R., Keeney, M., Kowalski, E.: Insider threat study: illicit cyber activity in the banking and finance sector. Tech. rep., 2018 (2005)
Maurer, M.-E., Höfer, L.: Sophisticated phishers make more spelling mistakes: using URL similarity against phishing. In: Xiang, Y., Lopez, J., Kuo, C.-C.J., Zhou, W. (eds.) CSS 2012. LNCS, vol. 7672, pp. 414–426. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35362-8_31
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Procter, R., Vis, F., Voss, A.: Reading the riots on Twitter: methodological innovation for the analysis of big data. Int. J. Soc. Res. Methodol. 16(3), 197–214 (2013). http://www.tandfonline.com/doi/abs/10.1080/13645579.2013.774172
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10 (2012)
Shang, J.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73(January), 220–239 (2017). https://doi.org/10.1016/j.eswa.2016.12.035
Ullah, I., Lane, C., Drury, B., Mellotte, M., Madden, M.: Open social data crime analytics. In: International Workshop on Artificial Intelligence in Security, At IJCAI, Melbourne, Australia, pp. 86–87 (2017)
Ullah, I., Petrosino, A.: About pyramid structure in convolutional neural networks. In: Proceeding of the International Joint Conference on Neural Networks, pp. 1318–1324 (2016)
Wang, B., Zubiaga, A., Liakata, M., Procter, R.: Making the most of tweet-inherent features for social spam detection on twitter. CEUR Workshop Proc. 1395, 10–16 (2015)
Wang, X., Gerber, M.S., Brown, D.E.: Automatic crime prediction using events extracted from twitter posts. In: Yang, S.J., Greenberg, A.M., Endsley, M. (eds.) SBP 2012. LNCS, vol. 7227, pp. 231–238. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29047-3_28
Zubiaga, A., Liakata, M., Procter, R., Wong, G., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLoS ONE 11(3), 1–16 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ullah, I. et al. (2021). Classification of Cybercrime Indicators in Open Social Data. In: Lossio-Ventura, J.A., Valverde-Rebaza, J.C., Díaz, E., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2020. Communications in Computer and Information Science, vol 1410. Springer, Cham. https://doi.org/10.1007/978-3-030-76228-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-76228-5_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76227-8
Online ISBN: 978-3-030-76228-5
eBook Packages: Computer ScienceComputer Science (R0)