Automatically Detecting Political Viewpoints in Norwegian Text | SpringerLink
Skip to main content

Automatically Detecting Political Viewpoints in Norwegian Text

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XXII (IDA 2024)

Abstract

We introduce three resources to support research on political texts in Scandinavia. The encoder-decoder transformer models sp-t5 and sp-t5-keyword were trained on political texts. The nor-pvi (available at https://tinyurl.com/nor-pvi) data set comprises political viewpoints, stances, and summaries for Norwegian. Experiments with four distinct tasks show that large-scale models, such as nort5 perform slightly better. Still, sp-t5 and sp-t5-keyword perform almost on par and require much less data and computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 12583
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 8293
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.stortinget.no/ (Accessed on 28 January 2024).

  2. 2.

    For a more comprehensive treatment, see [4].

  3. 3.

    https://huggingface.co/north/t5_base_NCC.

  4. 4.

    For evaluation purposes, these texts are excluded from the training data.

  5. 5.

    We use both GPT-3.5 and GPT-4 version at https://chat.openai.com/.

  6. 6.

    https://huggingface.co/datasets/NbAiLab/NCC.

  7. 7.

    https://huggingface.co/datasets/NbAiLab/norwegian_parliament.

  8. 8.

    https://data.stortinget.no/om-datatjenesten/bruksvilkar/.

  9. 9.

    Details at https://doi.org/10.7910/DVN/L4OAKN.

  10. 10.

    https://data.riksdagen.se/data/anforanden/.

  11. 11.

    https://repository.clarin.is/repository/xmlui/handle/20.500.12537/14.

  12. 12.

    The model was trained from mT5 checkpoint for 500K steps mainly on NCC dataset. See https://huggingface.co/north/t5_base_NCC.

  13. 13.

    TPUs are special computing nodes operated by Google Cloud.

  14. 14.

    https://ordbokene.no.

  15. 15.

    https://huggingface.co/ltg/nort5-base.

  16. 16.

    https://huggingface.co/north/t5_base_NCC.

  17. 17.

    https://huggingface.co/google/mt5-base.

References

  1. Borovikova, M., Ferré, A., Bossy, R., Roche, M., Nédellec, C.: Could Keyword Masking Strategy Improve Language Model? In: Métais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S. (eds.) NLDB 2023. LNCS, vol. 13913, pp. 271–284. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-35320-8_19

    Chapter  Google Scholar 

  2. Chin-Yew, L.: Looking for a Few Good Metrics: ROUGE and its Evaluation. In: Proceedings of the 4th NTCIR Workshops (2004)

    Google Scholar 

  3. Djemili, S., Longhi, J., Marinica, C., Kotzinos, D., Sarfati, G.E.: What does Twitter have to say about Ideology? In: NLP 4 CMC: Natural Language Processing for Computer-Mediated Communication/Social Media-Pre-conference Workshop at Konvens 2014. vol. 1. Universitätsverlag Hildesheim (2014)

    Google Scholar 

  4. Doan, T.M., Gulla, J.A.: A survey on political viewpoints identification. Online Soc. Networks Media 30 (2022). https://doi.org/10.1016/j.osnem.2022.100208

  5. Doan, T.M., Kille, B., Gulla, J.A.: Using language models for classifying the party affiliation of political texts. In: Rosso, P., Basile, V., Martínez, R., Mètais, E., Meziane, F. (eds.) NLDB. LNCS, pp. 382–393. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08473-7_35

  6. Doan, T.M., Kille, B., Gulla, J.A.: SP-BERT: a language model for political text in scandinavian languages. In: Metais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S. (eds.) NLDB 2023. LNCS, vol. 13913, pp. 467–477. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-35320-8_34

  7. Golchin, S., Surdeanu, M., Tavabi, N., Kiapour, A.: Do not mask randomly: effective domain-adaptive pre-training by masking in-domain keywords. In: Can, B., et al. (eds.) RepL4NLP. ACL (2023). https://doi.org/10.18653/v1/2023.repl4nlp-1.2

  8. Hardalov, M., Arora, A., Nakov, P., Augenstein, I.: Cross-domain label-adaptive stance detection. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.T. (eds.) CEMNLP. ACL (2021). https://doi.org/10.18653/v1/2021.emnlp-main.710

  9. Hardalov, M., Arora, A., Nakov, P., Augenstein, I.: Few-shot cross-lingual stance detection with sentiment-based pre-training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36 (2022)

    Google Scholar 

  10. Hu, Y., et al.: ConfliBERT: a pre-trained language model for political conflict and violence. In: NAACL. ACL (2022). https://doi.org/10.18653/v1/2022.naacl-main.400

  11. Hvingelby, R., Pauli, A.B., Barrett, M., Rosted, C., Lidegaard, L.M., Søgaard, A.: DaNE: a named entity resource for Danish. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4597–4604 (2020)

    Google Scholar 

  12. Iyyer, M., Enns, P., Boyd-Graber, J., Resnik, P.: Political ideology detection using recursive neural networks. ACL 1 (2014). https://doi.org/10.3115/v1/P14-1105

  13. Kannangara, S.: Mining Twitter for fine-grained political opinion polarity classification, ideology detection and sarcasm detection. In: WSDM. ACM (2018). https://doi.org/10.1145/3159652.3170461

  14. Kummervold, P.E., Wetjen, F., De la Rosa, J.: The NORWEGIAN colossal corpus: a text corpus for training large norwegian language models. In: LREC. European Language Resources Association (2022)

    Google Scholar 

  15. Kummervold, P.E., De la Rosa, J., Wetjen, F., Brygfjeld, S.A.: Operationalizing a national digital library: the case for a Norwegian transformer model. In: NoDaLiDa (2021)

    Google Scholar 

  16. Kutuzov, A., Barnes, J., Velldal, E., Øvrelid, L., Oepen, S.: Large-scale contextualised language modelling for Norwegian. In: NoDaLiDa. Linköping University Electronic Press, Sweden (2021)

    Google Scholar 

  17. Lapponi, E., Søyland, M.G., Velldal, E., Oepen, S.: The Talk of Norway: a richly annotated corpus of the Norwegian parliament, 1998–2016. LREC, pp. 1–21 (2018). https://doi.org/10.1007/s10579-018-9411-5

  18. Lin, W.H., Wilson, T., Wiebe, J., Hauptmann, A.: Which side are you on? IDENTIFYING perspectives at the document and sentence Levels. In: CoNLL-X. ACL (2006)

    Google Scholar 

  19. Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguistics 8, 726–742 (2020)

    Article  Google Scholar 

  20. Liu, Y., Zhang, X.F., Wegsman, D., Beauchamp, N., Wang, L.: POLITICS: pretraining with same-story article comparison for ideology prediction and stance detection. In: Findings of the Association for Computational Linguistics: NAACL 2022. ACL (2022). https://doi.org/10.18653/v1/2022.findings-naacl.101

  21. Maagerø, E. and Simonsen, B.: Norway: Society and Culture. Cappelen Damm Akademisk, 3rd edn. (2022)

    Google Scholar 

  22. Malmsten, M., Börjeson, L., Haffenden, C.: Playing with Words at the National Library of Sweden - Making a Swedish BERT. CoRR abs/2007.01658 (2020). https://arxiv.org/abs/2007.01658

  23. Menini, S., Tonelli, S.: Agreement and disagreement: comparison of points of view in the political domain. In: COLING 2016, the 26th International Conference on Computational Linguistics, pp. 2461–2470 (2016)

    Google Scholar 

  24. M’rabet, Y., Demner-Fushman, D.: HOLMS: alternative summary evaluation with large language models. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5679–5688 (2020)

    Google Scholar 

  25. Paul, M., Girju, R.: A two-dimensional topic-aspect model for discovering multi-faceted topics. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pp. 545–550. AAAI 2010, AAAI Press (2010)

    Google Scholar 

  26. Post, M.: A call for clarity in reporting BLEU scores. In: Proceedings of the Third Conference on Machine Translation: Research Papers, pp. 186–191. ACL (2018). https://www.aclweb.org/anthology/W18-6319

  27. Rauh, C., Schwalbach, J.: The ParlSpeech V2 data set: full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies (2020). https://doi.org/10.7910/DVN/L4OAKN

  28. Samuel, D., et al.: NorBench – a benchmark for Norwegian language models. In: NoDaLiDa. University of Tartu Library (2023)

    Google Scholar 

  29. Shazeer, N., Stern, M.: Adafactor: adaptive learning rates with sublinear memory cost. In: ICML, pp. 4596–4604. PMLR (2018)

    Google Scholar 

  30. Snæbjarnarson, V., et al.: A warm start and a clean crawled corpus - a recipe for good language models. In: LREC, pp. 4356–4366. ELRA, Marseille, France (2022)

    Google Scholar 

  31. Solberg, P.E., Ortiz, P.: The Norwegian Parliamentary Speech Corpus. arXiv preprint arXiv:2201.10881 (2022)

  32. Steingrímsson, S., Barkarson, S., Örnólfsson, G.T.: IGC-parl: Icelandic corpus of parliamentary proceedings. In: Proceedings of the Second ParlaCLARIN Workshop. pp. 11–17. ELRA, Marseille, France (2020)

    Google Scholar 

  33. Thonet, T., Cabanac, G., Boughanem, M., Pinel-Sauvagnat, K.: VODUM: a topic model unifying viewpoint, topic and opinion discovery. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 533–545. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_39

    Chapter  Google Scholar 

  34. Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). ELRA (2012)

    Google Scholar 

  35. Vamvas, J., Sennrich, R.: X-Stance: A Multilingual Multi-Target Dataset for Stance Detection. CoRR abs/2003.08385 (2020). https://arxiv.org/abs/2003.08385

  36. Virtanen, A., et al.: Multilingual is not enough: BERT for Finnish. arXiv preprint arXiv:1912.07076 (2019)

  37. Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. In: NAACL. ACL (2021). https://doi.org/10.18653/v1/2021.naacl-main.41

  38. Yang, D., Zhang, Z., Zhao, H.: Learning better masking for better language model pre-training. arXiv preprint arXiv:2208.10806 (2022)

Download references

Acknowledgements

This work is done as part of Trondheim Analytica project and funded under Digital Transformation program at Norwegian University of Science and Technology (NTNU), 7034 Trondheim, Norway. This work has been partly funded by the SFI NorwAI, (Center for Research-based Innovation, 309834). Model training was supported by Cloud TPUs from Google’s TPU Research Cloud program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tu My Doan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Doan, T.M., Baumgartner, D., Kille, B., Gulla, J.A. (2024). Automatically Detecting Political Viewpoints in Norwegian Text. In: Miliou, I., Piatkowski, N., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XXII. IDA 2024. Lecture Notes in Computer Science, vol 14641. Springer, Cham. https://doi.org/10.1007/978-3-031-58547-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-58547-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-58546-3

  • Online ISBN: 978-3-031-58547-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics