Abstract
Recent developments in online communication and their usage in everyday life have caused an explosion in the amount of a new genre of text data, short text. Thus, the need to classify this type of text based on its content has a significant implication in many areas. Online debates are no exception, once these provide access to information about opinions, positions and preferences of its users. This paper aims to use data obtained from online social conversations in Portuguese schools (short text) to observe behavioural trends and to see if students remain engaged in the discussion when stimulated. This project used the state of the art (SoA) Machine Learning (ML) algorithms and methods, through BERT based models to classify if utterances are in or out of the debate subject. Using SBERT embeddings as a feature, with supervised learning, the proposed model achieved results above 0.95 average accuracy for classifying online messages. Such improvements can help social scientists better understand human communication, behaviour, discussion and persuasion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
SBERT Model: paraphrase-multilingual-mpnet-base-v2. Multi-lingual model of paraphrase-mpnet-base-v2, extended to 50 + languages.
https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2.
References
Careaga-Butter, M., Mar´ıa Graciela, B.Q., Carolina, F.H.: Critical and prospective analysis of online education in pandemic and post-pandemic contexts: digital tools and resources to support teaching in synchronous and asynchronous learning modalities. Aloma: revista de psicologia, ci`encies de l’educacio´ i de l’esport Blanquerna 38(2), 23–32 (2020). https://raco.cat/index.php/Aloma/article/view/377756
Uthus, D.C., Aha, D. W.: Multiparticipant chat analysis: a survey, 106–121 (2013)
Anjewierden, A., Kolloffel, B., Hulshof, C.: Towards educational data mining: using data mining methods for automated chat analysis to understand and support inquiry learning processes (2007)
Trausan-Matu, S., Rebedea, T., Dragan, A., Alexandru, C.: Visualisation of learners’ contributions in chat conversations, 217–226 (2007). https://www.researchgate.net/publication/2102418955.
Alsmadi, I., Gan, K.H.: Review of short-text classification, 155–182 (2019)
Danilov, G., Ishankulov, T., Kotik, K., Orlov, Y., Shifrin, M., Potapov, A.: The classification of short scientific texts using pretrained BERT model, pp. 83–87, July 2021
Demirsoz, O., Ozcan, R.: Classification of news-related tweets. J. Inf. Sci. 43, 509–524 (2017)
Hu, Y., Ding, J., Dou, Z., Chang, H.: Short-text classification detector: a BERT-based mental approach. Comput. Intell. Neurosci. 2022 (2022)
Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks, March 2016. http://arxiv.org/abs/1603.03827
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey (2019)
Devlin, J., Chang, M.-W., Lee, K., Google, K.T., Language, A.I.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). https://github.com/tensorflow/tensor2tensor
Lin, Y.H., et al.: Choosing transfer languages for cross-lingual learning. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 3125–3135 (2020)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. CoRR, vol. abs/1908.10084, 2019. http://arxiv.org/abs/1908.10084
Hidey, C., Musi, E., Hwang, A., Muresan, S., McKeown, K.: Analyzing the semantic types of claims and premises in an online persuasive forum, pp. 11–21 (2017)
Meredith, J., Stokoe, E.: Repair: comparing Facebook ‘chat’ with spoken interaction. Discourse Commun. 8, 181–207 (2014)
Huynh, H.X., Nguyen, V.T., Duong-Trung, N., Pham, V.H., Phan, C.T.: Distributed framework for automating opinion discretization from text corpora on facebook. IEEE Access 7, 78675–78684 (2019)
Jucker, A.H.: Methodological issues in digital conversation analysis, August 2021
Meredith, J.: Conversation analysis and online interaction. Res. Lang. Soc. Inter. 52, 241–256 (2019). https://doi.org/10.1080/08351813.2019.1631040
Paulus, T., Warren, A., Lester, J.N.: Applying conversation analysis methods to online talk: a literature review. Discourse, Context Media 12, 1–10 (2016). https://doi.org/10.1016/j.dcm.2016.04.001
Liu, Y., Li, P., Hu, X.: Combining context-relevant features with multi-stage attention network for short text classification. Comput. Speech Lang. 71, 1 (2022)
Gupta, S., Bolden, S., Kachhadia, J., Korsunska, A., Stromer-Galley, J.: PoliBERT: classifying political social media messages with BERT (2020)
Khatri, A., Kumar, A.: Sarcasm detection in tweets with BERT and glove embeddings (2020)
Ye, Z., Jiang, G., Liu, Y., Li, Z., Yuan, J.: Document and word representations generated by graph convolutional network and BERT for short text classification, vol. 325, pp. 2275–2281. IOS Press BV, August 2020
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977). http://www.jstor.org/stable/2529310
Krippendorff, K., Mathet, Y., Bouvry, S., Widlo¨cher, A.: On the reliability of unitizing textual continua further: developments. Qual. Quant. 50, 2347–2364 (2016). https://doi.org/10.1007/s11135015-0266-1
Goldberg, Y., Levy, O.: word2vec explained: deriving mikolov et al.’s negativesampling word-embedding method. arXiv preprint: arXiv:1402.3722 (2014)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, vol. abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 302–308 (2014)
McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction (2018). https://arxiv.org/abs/1802.03426
Stoppiglia, H., Dreyfus, G., Dubois, R., Oussar, Y.: Ranking a random feature for variable and feature selection. J. Mach. Learn. Res. 3, 1399–1414 (2003)
Chen, T., Guestrin, C.: XGBoost. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, August 2016. https://doi.org/10.1145/2F2939672.2939785
Mestre, G., Matos-Carvalho, J.P., Tavares, R.M.: Irrigation management system using artificial intelligence algorithms. In: 2022 International Young Engineers Forum (YEF-ECE), pp. 69–74 (2022)
Cristianini, N., Ricci, E.: Support Vector Machines. Springer, Boston, pp. 928–932 (2008). https://doi.org/10.1007/978-0-387-30162-4_415
Matos-Carvalho, J.P., et al.: Static and dynamic algorithms for terrain classification in uav aerial imagery. Remote Sens. 11(21), 2501 (2019). https://doi.org/10.3390/rs11212501
Sulemane, S., Matos-Carvalho, J.P., Pedro, D., Moutinho, F., Correia, S.D.: Vineyard gap detection by convolutional neural networks fed by multi-spectral images. Algorithms 15(12), 440 (2022)
Santos, R., Matos-Carvalho, J.P., Tomic, S., Beko, M., Correia, S.D.: Applying deep neural networks to improve UAV navigation in satelliteless environments.In: 2022 International Young Engineers Forum (YEFECE), pp. 63–68 (2022)
Pedro, D., Matos-Carvalho, J.P., Fonseca, J.M., Mora, A.: Collision avoidance on unmanned aerial vehicles using neural network pipelines and flow clustering techniques. Remote Sens. 13(13), 2643 (2021)
Matos-Carvalho, J.P., et al.: Static and dynamic algorithms for terrain classification in UAV aerial imagery. Remote Sens. 11(21), 2501 (2019)
Nakama, J., Parada, R., Matos-Carvalho, J.P., Azevedo, F., Pedro, D., Campos, L.: Autonomous environment generator for UAV-based simulation. Appl. Sci. 11(5), 2185 (2021)
Pedro, D., Mora, A., Carvalho, J., Azevedo, F., Fonseca, J.: Colanet: a UAV collision avoidance dataset. In: Camarinha-Matos, L.M., Farhadi, N., Lopes, F., Pereira, H. (eds.) DoCEIS 2020. IAICT, vol. 577, pp. 53–62. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45124-0_5
Salvado, A.B., et al.: Semantic navigation mapping from aerial multispectral imagery. In: 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), pp. 1192–1197 (2019)
Matos-Carvalho, J.P., Correia, S.D., Tomic, S.: Sensitivity analysis of LSTM networks for fall detection wearable sensors. In: 2023 6th Conference on Cloud and Internet of Things (CIoT), Lisbon, Portugal, pp. 112–118 (2023) https://doi.org/10.1109/CIoT57267.2023.10084906
Vong, A., et al.: How to build a 2D and 3D aerial multispectral map?—All steps deeply explained. Remote Sens. 13(16), 3227 (2021). https://doi.org/10.3390/rs13163227
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser B (Methodological) 36(2), 111–147 (1974). http://www.jstor.org/stable/2984809
Acknowledgment
This research was partially funded by Fundação para a Ciência e a Tecnologia under Projects “Factors for promoting dialogue and healthy behaviours in online school communities” with reference DSAIPA/DS/0102/2019 and developed at the R&D Unit CICANT - Research Center for Applied Communication, Culture and New Technologies, UIDB/04111/2020, UIDB/50008/2020 as well as Instituto Lusófono de Investigação e Desenvolvimento (ILIND) under Project COFAC/ILIND/COPELABS/1/2022.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ferreira-Saraiva, B.D., Marques-Pita, M., Matos-Carvalho, J.P., Pirola, Z. (2023). QiBERT - Classifying Online Conversations. In: Camarinha-Matos, L.M., Ferrada, F. (eds) Technological Innovation for Connected Cyber Physical Spaces. DoCEIS 2023. IFIP Advances in Information and Communication Technology, vol 678. Springer, Cham. https://doi.org/10.1007/978-3-031-36007-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-36007-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36006-0
Online ISBN: 978-3-031-36007-7
eBook Packages: Computer ScienceComputer Science (R0)