Abstract
Text subjectivity is an important research topic due to its applications in various domains such as sentiment analysis, opinion mining, social media monitoring, clinical research and patient feedback analysis. While rule-based approaches dominated this field at the beginning of the 21st century, contemporary works rely on transformers, a specific neural network architecture designed for language modeling. This paper explores the performance of various BERT-based models, including our fine-tuned BERT (Bidirectional Encoder Representations from Transformer) model, and compares them with pre-built models. To assess the generalization abilities of the models, we evaluated the models on benchmark datasets. Additionally, the models underwent evaluation on two synthetic datasets created using large language models. To ensure reproducibility, we have made our implementation publicly available at https://github.com/margitantal68/TextSubjectivity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Antici, F., et al.: A corpus for sentence-level subjectivity detection on English news articles (2023)
Chen, Q., Zhang, R., Zheng, Y., Mao, Y.: Dual contrastive learning: text classification via label-aware data augmentation (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1423
Forman, N., Udvaros, J., Avornicului, M.S.: Chatgpt: a new study tool shaping the future for high school students. Int. J. Adv. Nat. Sci. Eng. Res. 7(4), 95–102 (2023)
Kim, Y.: Convolutional neural networks for sentence classification. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Doha (2014). https://doi.org/10.3115/v1/D14-1181
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. Proc. AAAI Conf. Artif. Intell. 29(1), 2267–2273 (2015). https://doi.org/10.1609/aaai.v29i1.9513
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems. vol. 26. Curran Associates, Inc. (2013)
Mosbach, M., Andriushchenko, M., Klakow, D.: On the stability of fine-tuning BERT: misconceptions, explanations, and strong baselines. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, 3–7 May 2021. OpenReview.net (2021). https://openreview.net/forum?id=nzpLWnVAyah
Nandi, R., Maiya, G., Kamath, P., Shekhar, S.: An empirical evaluation of word embedding models for subjectivity analysis tasks. In: 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), pp. 1–5 (2021). https://doi.org/10.1109/ICAECT49130.2021.9392437
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), pp. 271–278 (2004)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha (2014). https://doi.org/10.3115/v1/D14-1162
Peters, M.E., et al.: Deep contextualized word representations. In: Walker, M., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics, New Orleans (2018). https://doi.org/10.18653/v1/N18-1202
Přibáň, P., Steinberger, J.: Czech dataset for cross-lingual subjectivity classification. In: Calzolari, N., et al. (eds.) Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 1381–1391. European Language Resources Association, Marseille (2022). https://aclanthology.org/2022.lrec-1.148
Pryzant, R., Martinez, R.D., Dass, N., Kurohashi, S., Jurafsky, D., Yang, D.: Automatically neutralizing subjective bias in text (2019)
Radford, A., Narasimhan, K.: Improving language understanding by generative pre-training (2018). https://api.semanticscholar.org/CorpusID:49313245
Revina, A., Buza, K., Meister, V.G.: It ticket classification: the simpler, the better. IEEE Access 8, 193380–193395 (2020)
Riloff, E., Wiebe, J.: Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 105–112 (2003). https://aclanthology.org/W03-1014
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016)
Zhang, T., Wu, F., Katiyar, A., Weinberger, K.Q., Artzi, Y.: Revisiting few-sample BERT fine-tuning. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, 3–7 May 2021. OpenReview.net (2021). https://openreview.net/forum?id=cO1IH43yUF
Zhao, H., Lu, Z., Poupart, P.: Self-adaptive hierarchical sentence model. In: Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI 2015), pp. 4069–4076. AAAI Press (2015)
Zhu, Y., et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 19–27 (2015). https://doi.org/10.1109/ICCV.2015.11
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
7 Appendix
7 Appendix
We provide the input prompt utilized for the text classification task in the GPT-3.5-turbo model.
prompt = f“““You are an expert linguist. You should decide about a sentence whether is a subjective or an objective sentence. A sentence is subjective if its content is based on or influenced by personal feelings, tastes, or opinions. Otherwise, the sentence is objective. More precisely, a sentence is subjective if one or more of the following conditions apply: 1. expresses an explicit personal opinion from the author (e.g., speculations to draw conclusions); 2. includes sarcastic or ironic expressions; 3. gives exhortations of personal auspices; 4. contains discriminating or downgrading expressions; 5. contains rhetorical figures that convey the author’s opinion. Please classify the following text text in one of the two classes: (SUBJECTIVE, OBJECTIVE). Please, answer with a single word: OBJECTIVE or SUBJECTIVE”””
The instruction utilized for generating sentences is as follows:
prompt = f“““You are an expert linguist. Please, generate 10 objective sentences. A sentence is considered objective when it presents information in a factual and unbiased manner, without expressing personal opinions, emotions, or interpretations. Otherwise, the sentence is subjective.”””
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Antal, M., Buza, K., Nemes, S. (2024). Advancements in Text Subjectivity Analysis: From Simple Approaches to BERT-Based Models and Generalization Assessments. In: Nguyen, NT., et al. Advances in Computational Collective Intelligence. ICCCI 2024. Communications in Computer and Information Science, vol 2165. Springer, Cham. https://doi.org/10.1007/978-3-031-70248-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-70248-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70247-1
Online ISBN: 978-3-031-70248-8
eBook Packages: Computer ScienceComputer Science (R0)