Abstract
The fully automated sentiment analysis on large text collections is an important task in many applications scenarios. The sentiment analysis is a challenging task due to the domain-specific language style and the variety of sentiment indicators. The basis for learning powerful sentiment classifiers are annotated datasets, but for many domains and especially with non-English texts hardly any datasets exist. In order to support the development of sentiment classifiers, we have created two corpora: The first corpus is build based on German news articles. Although news articles should be objective, they often excite subjective emotions. The second corpus consists of annotated messages from a German telecommunication forum. In this paper we describe the process of creating the corpora and discuss our approach for tracing sentiment values, defining clear rules for assigning sentiments scores. Given the corpora we train classifiers that yields good classification results and establish valuable baselines for sentiment analysis. We compare the learned classification strategies and discuss how the approaches can be transferred to new scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Keyword/phase set: abgespeist, abzocken, ärgere, ärgerlich, arsch, beschwerde, blöd, desaster, die nase voll, dumm, dümmer, enttäuscht, ex-kunde, frechheit, frustriert, grottig, hinhalten, hohn, idiot, kann doch nicht so schwer sein, katastrophe, minderwertigste, nervt, nicht kapiert, opfer, rausnehmen, reklamiert, schämen, schnauze, scheiss, schlimmer, schuld, teufel, unfassbar, unzufrieden, verärgert, vergebens, versagt, verschlimmbesserung, verschonen, verschont, vertrösten, verzweifeln, wird mir schlecht.
- 2.
References
Ali, T., Schramm, D., Sokolova, M., Inkpen, D.: Can I hear you? Sentiment analysis on medical forums. In: Proceedings of the International Joint Conference on Natural Language Processing 2013, pp. 667–673. ACL (2013)
Balahur, A., Steinberger, R.: Rethinking sentiment analysis in the news: from theory to practice and back. In: Proceeding of WOMSA, vol. 9 (2009)
Bosco, C., Patti, V., Bolioli, A.: Developing corpora for sentiment analysis: the case of irony and Senti-TUT. IEEE Intell. Syst. 28(2), 55–63 (2013)
Bütow, F., Schultze, F., Strauch, L., Ploch, D., Lommatzsch, A.: Sentiment analysis with machine learning algorithms on German news articles. Project report, Berlin Institute of Technology, AOT (2015). http://www.dai-labor.de/publikationen/1052
Clematide, S., Gindl, S., Klenner, M., Petrakis, S., Remus, R., Ruppenhofer, J., Waltinger, U., Wiegand, M.: MLSA-A multi-layered reference corpus for German sentiment analysis. In: LREC, pp. 3551–3556 (2012)
Boland, K., Wira-Alam, A., Messerschmidt, R.: Creating an annotated corpus for sentiment analysis of German product reviews. Monograph, GESIS - Leibniz-Institut für Sozialwissenschaften (2013). http://www.ssoar.info/ssoar/bitstream/handle/document/33939/ssoar-2013-boland_et_al-Creating_an_Annotated_Corpus_for.pdf
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
Remus, R., Quasthoff, U., Heyer, G.: SentiWS - a publicly available German-language resource for sentiment analysis. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta, Malta (2010)
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive Bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), vol. 3, pp. 616–623 (2003)
Scholz, T., Conrad, S., Hillekamps, L.: Opinion mining on a German corpus of a media response analysis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 39–46. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32790-2_4
University of Waikato: Weka 3 - Data Mining with Open Source Machine Learning Software in Java. http://www.cs.waikato.ac.nz/ml/weka
Acknowledgment
This work was supported in part by the German Federal Ministry of Education and Research (BMBF) under the grant number 01IS16046.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lommatzsch, A., Bütow, F., Ploch, D., Albayrak, S. (2017). Towards the Automatic Sentiment Analysis of German News and Forum Documents. In: Eichler, G., Erfurth, C., Fahrnberger, G. (eds) Innovations for Community Services. I4CS 2017. Communications in Computer and Information Science, vol 717. Springer, Cham. https://doi.org/10.1007/978-3-319-60447-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-60447-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60446-6
Online ISBN: 978-3-319-60447-3
eBook Packages: Computer ScienceComputer Science (R0)