Automatically Detecting Political Viewpoints in Norwegian Text

Doan, Tu My; Baumgartner, David; Kille, Benjamin; Gulla, Jon Atle

doi:10.1007/978-3-031-58547-0_20

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14641))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

496 Accesses

Abstract

We introduce three resources to support research on political texts in Scandinavia. The encoder-decoder transformer models sp-t5 and sp-t5-keyword were trained on political texts. The nor-pvi (available at https://tinyurl.com/nor-pvi) data set comprises political viewpoints, stances, and summaries for Norwegian. Experiments with four distinct tasks show that large-scale models, such as nort5 perform slightly better. Still, sp-t5 and sp-t5-keyword perform almost on par and require much less data and computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 12583; Price includes VAT (Japan)

Softcover Book: JPY 8293; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SP-BERT: A Language Model for Political Text in Scandinavian Languages

A Transfer Learning Analysis of Political Leaning Classification in Cross-domain Content

Fighting Media Hyper-partisanship with Modern Language Representation Models

Notes

1.
https://www.stortinget.no/ (Accessed on 28 January 2024).
2.
For a more comprehensive treatment, see [4].
3.
https://huggingface.co/north/t5_base_NCC.
4.
For evaluation purposes, these texts are excluded from the training data.
5.
We use both GPT-3.5 and GPT-4 version at https://chat.openai.com/.
6.
https://huggingface.co/datasets/NbAiLab/NCC.
7.
https://huggingface.co/datasets/NbAiLab/norwegian_parliament.
8.
https://data.stortinget.no/om-datatjenesten/bruksvilkar/.
9.
Details at https://doi.org/10.7910/DVN/L4OAKN.
10.
https://data.riksdagen.se/data/anforanden/.
11.
https://repository.clarin.is/repository/xmlui/handle/20.500.12537/14.
12.
The model was trained from mT5 checkpoint for 500K steps mainly on NCC dataset. See https://huggingface.co/north/t5_base_NCC.
13.
TPUs are special computing nodes operated by Google Cloud.
14.
https://ordbokene.no.
15.
https://huggingface.co/ltg/nort5-base.
16.
https://huggingface.co/north/t5_base_NCC.
17.
https://huggingface.co/google/mt5-base.

References

Borovikova, M., Ferré, A., Bossy, R., Roche, M., Nédellec, C.: Could Keyword Masking Strategy Improve Language Model? In: Métais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S. (eds.) NLDB 2023. LNCS, vol. 13913, pp. 271–284. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-35320-8_19
Chapter Google Scholar
Chin-Yew, L.: Looking for a Few Good Metrics: ROUGE and its Evaluation. In: Proceedings of the 4th NTCIR Workshops (2004)
Google Scholar
Djemili, S., Longhi, J., Marinica, C., Kotzinos, D., Sarfati, G.E.: What does Twitter have to say about Ideology? In: NLP 4 CMC: Natural Language Processing for Computer-Mediated Communication/Social Media-Pre-conference Workshop at Konvens 2014. vol. 1. Universitätsverlag Hildesheim (2014)
Google Scholar
Doan, T.M., Gulla, J.A.: A survey on political viewpoints identification. Online Soc. Networks Media 30 (2022). https://doi.org/10.1016/j.osnem.2022.100208
Doan, T.M., Kille, B., Gulla, J.A.: Using language models for classifying the party affiliation of political texts. In: Rosso, P., Basile, V., Martínez, R., Mètais, E., Meziane, F. (eds.) NLDB. LNCS, pp. 382–393. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08473-7_35
Doan, T.M., Kille, B., Gulla, J.A.: SP-BERT: a language model for political text in scandinavian languages. In: Metais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S. (eds.) NLDB 2023. LNCS, vol. 13913, pp. 467–477. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-35320-8_34
Golchin, S., Surdeanu, M., Tavabi, N., Kiapour, A.: Do not mask randomly: effective domain-adaptive pre-training by masking in-domain keywords. In: Can, B., et al. (eds.) RepL4NLP. ACL (2023). https://doi.org/10.18653/v1/2023.repl4nlp-1.2
Hardalov, M., Arora, A., Nakov, P., Augenstein, I.: Cross-domain label-adaptive stance detection. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.T. (eds.) CEMNLP. ACL (2021). https://doi.org/10.18653/v1/2021.emnlp-main.710
Hardalov, M., Arora, A., Nakov, P., Augenstein, I.: Few-shot cross-lingual stance detection with sentiment-based pre-training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36 (2022)
Google Scholar
Hu, Y., et al.: ConfliBERT: a pre-trained language model for political conflict and violence. In: NAACL. ACL (2022). https://doi.org/10.18653/v1/2022.naacl-main.400
Hvingelby, R., Pauli, A.B., Barrett, M., Rosted, C., Lidegaard, L.M., Søgaard, A.: DaNE: a named entity resource for Danish. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4597–4604 (2020)
Google Scholar
Iyyer, M., Enns, P., Boyd-Graber, J., Resnik, P.: Political ideology detection using recursive neural networks. ACL 1 (2014). https://doi.org/10.3115/v1/P14-1105
Kannangara, S.: Mining Twitter for fine-grained political opinion polarity classification, ideology detection and sarcasm detection. In: WSDM. ACM (2018). https://doi.org/10.1145/3159652.3170461
Kummervold, P.E., Wetjen, F., De la Rosa, J.: The NORWEGIAN colossal corpus: a text corpus for training large norwegian language models. In: LREC. European Language Resources Association (2022)
Google Scholar
Kummervold, P.E., De la Rosa, J., Wetjen, F., Brygfjeld, S.A.: Operationalizing a national digital library: the case for a Norwegian transformer model. In: NoDaLiDa (2021)
Google Scholar
Kutuzov, A., Barnes, J., Velldal, E., Øvrelid, L., Oepen, S.: Large-scale contextualised language modelling for Norwegian. In: NoDaLiDa. Linköping University Electronic Press, Sweden (2021)
Google Scholar
Lapponi, E., Søyland, M.G., Velldal, E., Oepen, S.: The Talk of Norway: a richly annotated corpus of the Norwegian parliament, 1998–2016. LREC, pp. 1–21 (2018). https://doi.org/10.1007/s10579-018-9411-5
Lin, W.H., Wilson, T., Wiebe, J., Hauptmann, A.: Which side are you on? IDENTIFYING perspectives at the document and sentence Levels. In: CoNLL-X. ACL (2006)
Google Scholar
Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguistics 8, 726–742 (2020)
Article Google Scholar
Liu, Y., Zhang, X.F., Wegsman, D., Beauchamp, N., Wang, L.: POLITICS: pretraining with same-story article comparison for ideology prediction and stance detection. In: Findings of the Association for Computational Linguistics: NAACL 2022. ACL (2022). https://doi.org/10.18653/v1/2022.findings-naacl.101
Maagerø, E. and Simonsen, B.: Norway: Society and Culture. Cappelen Damm Akademisk, 3rd edn. (2022)
Google Scholar
Malmsten, M., Börjeson, L., Haffenden, C.: Playing with Words at the National Library of Sweden - Making a Swedish BERT. CoRR abs/2007.01658 (2020). https://arxiv.org/abs/2007.01658
Menini, S., Tonelli, S.: Agreement and disagreement: comparison of points of view in the political domain. In: COLING 2016, the 26th International Conference on Computational Linguistics, pp. 2461–2470 (2016)
Google Scholar
M’rabet, Y., Demner-Fushman, D.: HOLMS: alternative summary evaluation with large language models. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5679–5688 (2020)
Google Scholar
Paul, M., Girju, R.: A two-dimensional topic-aspect model for discovering multi-faceted topics. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pp. 545–550. AAAI 2010, AAAI Press (2010)
Google Scholar
Post, M.: A call for clarity in reporting BLEU scores. In: Proceedings of the Third Conference on Machine Translation: Research Papers, pp. 186–191. ACL (2018). https://www.aclweb.org/anthology/W18-6319
Rauh, C., Schwalbach, J.: The ParlSpeech V2 data set: full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies (2020). https://doi.org/10.7910/DVN/L4OAKN
Samuel, D., et al.: NorBench – a benchmark for Norwegian language models. In: NoDaLiDa. University of Tartu Library (2023)
Google Scholar
Shazeer, N., Stern, M.: Adafactor: adaptive learning rates with sublinear memory cost. In: ICML, pp. 4596–4604. PMLR (2018)
Google Scholar
Snæbjarnarson, V., et al.: A warm start and a clean crawled corpus - a recipe for good language models. In: LREC, pp. 4356–4366. ELRA, Marseille, France (2022)
Google Scholar
Solberg, P.E., Ortiz, P.: The Norwegian Parliamentary Speech Corpus. arXiv preprint arXiv:2201.10881 (2022)
Steingrímsson, S., Barkarson, S., Örnólfsson, G.T.: IGC-parl: Icelandic corpus of parliamentary proceedings. In: Proceedings of the Second ParlaCLARIN Workshop. pp. 11–17. ELRA, Marseille, France (2020)
Google Scholar
Thonet, T., Cabanac, G., Boughanem, M., Pinel-Sauvagnat, K.: VODUM: a topic model unifying viewpoint, topic and opinion discovery. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 533–545. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_39
Chapter Google Scholar
Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). ELRA (2012)
Google Scholar
Vamvas, J., Sennrich, R.: X-Stance: A Multilingual Multi-Target Dataset for Stance Detection. CoRR abs/2003.08385 (2020). https://arxiv.org/abs/2003.08385
Virtanen, A., et al.: Multilingual is not enough: BERT for Finnish. arXiv preprint arXiv:1912.07076 (2019)
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. In: NAACL. ACL (2021). https://doi.org/10.18653/v1/2021.naacl-main.41
Yang, D., Zhang, Z., Zhao, H.: Learning better masking for better language model pre-training. arXiv preprint arXiv:2208.10806 (2022)

Download references

Acknowledgements

This work is done as part of Trondheim Analytica project and funded under Digital Transformation program at Norwegian University of Science and Technology (NTNU), 7034 Trondheim, Norway. This work has been partly funded by the SFI NorwAI, (Center for Research-based Innovation, 309834). Model training was supported by Cloud TPUs from Google’s TPU Research Cloud program.

Author information

Authors and Affiliations

Norwegian University of Science and Technology, Trondheim, Norway
Tu My Doan, David Baumgartner, Benjamin Kille & Jon Atle Gulla

Authors

Tu My Doan
View author publications
You can also search for this author in PubMed Google Scholar
David Baumgartner
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Kille
View author publications
You can also search for this author in PubMed Google Scholar
Jon Atle Gulla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tu My Doan .

Editor information

Editors and Affiliations

Stockholm University, Kista, Sweden
Ioanna Miliou
Fraunhofer IAIS, Sankt Augustin, Germany
Nico Piatkowski
Stockholm University, Kista, Sweden
Panagiotis Papapetrou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Doan, T.M., Baumgartner, D., Kille, B., Gulla, J.A. (2024). Automatically Detecting Political Viewpoints in Norwegian Text. In: Miliou, I., Piatkowski, N., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XXII. IDA 2024. Lecture Notes in Computer Science, vol 14641. Springer, Cham. https://doi.org/10.1007/978-3-031-58547-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-58547-0_20
Published: 16 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58546-3
Online ISBN: 978-3-031-58547-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatically Detecting Political Viewpoints in Norwegian Text

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SP-BERT: A Language Model for Political Text in Scandinavian Languages

A Transfer Learning Analysis of Political Leaning Classification in Cross-domain Content

Fighting Media Hyper-partisanship with Modern Language Representation Models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatically Detecting Political Viewpoints in Norwegian Text

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SP-BERT: A Language Model for Political Text in Scandinavian Languages

A Transfer Learning Analysis of Political Leaning Classification in Cross-domain Content

Fighting Media Hyper-partisanship with Modern Language Representation Models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation