Abstract
Starting with the mid-2010s, the growing impact of social media on politics has been felt worldwide. This impact became even more decisive in instances such as the 2020 US Presidential Election, where a bitter division between the two main US political parties has resulted in an increasingly hostile and combative online discourse on social media platforms such as Twitter (now known as X). Nevertheless, this environment obscures under partisan discourse the actual ideological contents of the debates. We wish to present in this paper a novel dataset we have compiled from publicly available news websites exhibiting a certain political affiliation, as well as a deep learning classification pipeline trained on this dataset. Our best model, based on BERT, can reveal the rough political affiliation of a certain text, on a seven-point left-right political ideology scale, obtaining an F1 score of 90.33% on our validation dataset. We apply this classifier on a sample of 1.5M tweets retrieved from the #Election2020 dataset and perform an n-gram analysis of the classification results, using s-BERT embeddings and DBSCAN clustering to reveal topics of interest to communities of different political persuasions. We highlight the value of such an approach in defining which subjects are of most interest to each political camp and warn about the presence of polarized political discourse propagated through social media. We make the novel dataset available to other researchers as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Kovacs, E.-R., Cotfas, L.-A.: Challenges to democracy: attitudes towards the January 6 events at the capitol on social media. In: Ciurea, C., Pocatilu, P., Filip, F.G. (eds.) Education, Research and Business Technologies. Smart Innovation, Systems and Technologies, vol. 321, pp. 243–255. Springer, Singapore (2023). https://doi.org/10.1007/978-981-19-6755-9_20
Chen, E., Deb, A., Ferrara, E.: #Election2020: the first public Twitter dataset on the US Presidential election”. J. Comput. Soc. Sci. (2020). https://doi.org/10.1007/s42001-021-00117-9
Bârgăoanu, A.: Tirania actualității (“The Tyranny of Today”). Bucharest, Romania: Tritonic (2006)
Guerrero-Solé, F.: The ideology of media. Measuring the political leaning of Spanish news media through Twitter users’ interactions. Commun. Soc. 35(1), Art. no. 1 (2022). https://doi.org/10.15581/003.35.1.29-43
“Interactive Media Bias Chart,” Ad Fontes Media. https://adfontesmedia.com/interactive-media-bias-chart/. Accessed 27 Feb 2024
Preoţiuc-Pietro, D., Liu, Y., Hopkins, D., Ungar, L.: Beyond binary labels: political ideology prediction of twitter users. In: Barzilay, R., Kan, M.-Y. (eds.) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), July 2017, pp. 729–740. Association for Computational Linguistics, Vancouver, Canada (2017). https://doi.org/10.18653/v1/P17-1068
Lazer, D., et al.: Computational social science. Science 323(5915), 721–723 (2009). https://doi.org/10.1126/science.1167742
Conte, R., et al.: Manifesto of computational social science. Eur. Phys. J. Spec. Top. 214(1), 325–346 (2012). https://doi.org/10.1140/epjst/e2012-01697-8
Zhang, A.X., Counts, S.: Modeling ideology and predicting policy change with social media: case of same-sex marriage. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 2603–2612. ACM, Seoul, Republic of Korea (2015). https://doi.org/10.1145/2702123.2702193
Hine, G.E., et al.: Kek, cucks, and god emperor trump: a measurement study of 4chan’s politically incorrect forum and its effects on the web. arXiv, 01 October 2017. https://doi.org/10.48550/arXiv.1610.03452
Tasnim, Z., Ahmed, S., Rahman, A., Sorna, J.F., Rahman, M.: Political ideology prediction from Bengali text using word embedding models. In: 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), pp. 724–727. IEEE, Pune, India (2021). https://doi.org/10.1109/ESCI50559.2021.9396875
Fagni, T., Cresci, S.: Fine-grained prediction of political leaning on social media with unsupervised deep learning. JAIR 73, 633–672 (2022). https://doi.org/10.1613/jair.1.13112
Twitter API | Products. https://developer.twitter.com/en/products/twitter-api. Accessed 25 Feb 2024
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 [cs], July 2019. http://arxiv.org/abs/1907.11692. Accessed 26 Feb 2022
Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M., Liu, Q.: ERNIE: enhanced language representation with informative entities. In: Korhonen, A., Traum, D., Màrquez, L. (eds.) Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1441–1451. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1139
He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with disentangled attention. arXiv (2021). https://doi.org/10.48550/arXiv.2006.03654
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. arXiv:1906.08237 [cs], January 2020. http://arxiv.org/abs/1906.08237. Accessed 15 Mar 2022
Heltzel, G., Laurin, K.: Polarization in America: two possible futures. Curr. Opin. Behav. Sci. 34, 179–184 (2020). https://doi.org/10.1016/j.cobeha.2020.03.008
Lees, J., Cikara, M.: Understanding and combating misperceived polarization. PsyarXiv, 29 July 2020. https://doi.org/10.31234/osf.io/ncwez
Kovacs, E.-R., Cotfas, L.-A., Delcea, C.: From unhealthy online conversation to political violence: the case of the January 6th events at the capitol. In: Bădică, C., Treur, J., Benslimane, D., Hnatkowska, B., Krótkiewicz, M. (eds.) ICCCI 2022. CCIS, vol. 1653, pp. 3–15. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16210-7_1
Gilda, S., Giovanini, L., Silva, M., Oliveira, D.: Predicting different types of subtle toxicity in unhealthy online conversations. Procedia Comput. Sci. 198, 360–366 (2022). https://doi.org/10.1016/j.procs.2021.12.254
Cotfas, L.-A., Delcea, C., Roxin, I., Ioanăş, C., Gherai, D.S., Tajariol, F.: The longest month: analyzing COVID-19 vaccination opinions dynamics from tweets in the month following the first vaccine announcement. IEEE Access 9, 33203–33223 (2021). https://doi.org/10.1109/ACCESS.2021.3059821
Kovacs, E.-R., Cotfas, L.-A., Delcea, C.: COVID-19 vaccination opinions in education-related tweets. In: Bilgin, M.H., Danis, H., Demir, E. (eds.) Eurasian Business and Economics Perspectives. Eurasian Studies in Business and Economics, vol. 24, pp. 21–41. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15531-4_2
Acknowledgments
This study was co-financed by The Bucharest University of Economic Studies during the PhD program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kovacs, ER., Cotfas, LA., Delcea, C. (2024). A Deep Learning Approach to Fine-Grained Political Ideology Classification on Social Media Texts. In: Nguyen, N.T., et al. Computational Collective Intelligence. ICCCI 2024. Lecture Notes in Computer Science(), vol 14811. Springer, Cham. https://doi.org/10.1007/978-3-031-70819-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-70819-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70818-3
Online ISBN: 978-3-031-70819-0
eBook Packages: Computer ScienceComputer Science (R0)