Abstract
We present an approach to extracting arguments from social media, exemplified by a case study on a large corpus of Twitter messages collected under the #Brexit hashtag during the run-up to the referendum in 2016. Our method is based on constructing dedicated corpus queries that capture predefined argumentation patterns following standard Walton-style argumentation schemes. Query matches are transformed directly into logical patterns, i. e. formulae with placeholders in a general form of modal logic. We prioritize precision over recall, exploiting the fact that the sheer size of the corpus still delivers substantial numbers of matches for all patterns, and with the goal of eventually gaining an overview of widely-used arguments and argumentation schemes. We evaluate our approach in terms of recall on a manually annotated gold standard of 1000 randomly selected tweets for three selected high-frequency patterns. We also estimate precision by manual inspection of query matches in the entire corpus. Both evaluations are accompanied by an analysis of inter-annotator agreement between three independent judges.
About the authors
Natalie Dykes is working in Stefan Evert’s Computational Corpus Linguistics group. She holds a B. A. in computational linguistics, Scandinavian studies and an M. A. in linguistics. Her research interests include corpus-based discourse analysis, argumentation, and computer-mediated communication.
Stefan Evert holds the Chair of Computational Corpus Linguistics at FAU Erlangen-Nürnberg. After studying mathematics, physics and English linguistics, he received a PhD degree in computational linguistics from the University of Stuttgart. His research interests include the statistical methodology of corpus linguistics, co-occurrence phenomena and software tools for processing large text corpora.
After receiving his B. Sc. in Computer Science and Media from the TH-Nürnberg in 2015 Merlin Göttlinger continued with an M. Sc. in Computer Science at FAU Erlangen-Nürnberg which he completed in 2018. Afterwards, he started working as a PhD student at the Chair of Theoretical Computer Science (INF8) at FAU Erlangen-Nürnberg researching logic formalism for argumentation.
Philipp Heinrich is working in Stefan Evert’s Computational Corpus Linguistics group. Having studied mathematics, linguistics, and philosophy, his research interests include corpus-based discourse analysis and argumentation mining with a focus on the comparison of social and mass media.
Lutz Schröder holds the chair for theoretical computer science at FAU Erlangen-Nürnberg. He received a PhD in mathematics and subsequently the habilitation in computer science from the University of Bremen, and has held a senior researcher position at the German Research Center for Artificial Intelligence (DFKI). His main research area is logic in computer science.
References
1. T. Alsinet, J. Argelich, R. Béjar, and J. Cemeli. A distributed argumentation algorithm for mining consistent opinions in weighted twitter discussions. Soft Comput., 23(7):2147–2166, 2019.10.1007/s00500-018-3380-xSearch in Google Scholar
2. R. Alur, T. Henzinger, and O. Kupferman. Alternating-time temporal logic. J. ACM, 49:672–713, 2002.10.1109/SFCS.1997.646098Search in Google Scholar
3. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, eds. The Description Logic Handbook. Cambridge University Press, 2003.Search in Google Scholar
4. P. Baroni, D. Gabbay, M. Giacomin, and L. van der Torre, eds. Handbook of Formal Argumentation. College Publications, 2018.Search in Google Scholar
5. T. Bosc, E. Cabrio, and S. Villata. Tweeties squabbling: Positive and negative results in applying argument mining on social media. In Computational Models of Argument, COMMA 2016, vol. 287 of Frontiers Artif. Intell. Appl., pp. 21–32. IOS Press, 2016.Search in Google Scholar
6. E. Cabrio and S. Villata. Five years of argument mining: a data-driven analysis. In International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 5427–5433, 2018. ijcai.org.10.24963/ijcai.2018/766Search in Google Scholar
7. B. Chellas. Modal logic. Cambridge University Press, 1980.10.1017/CBO9780511621192Search in Google Scholar
8. C. Chesñevar, J. McGinnis, S. Modgil, I. Rahwan, C. Reed, G. Simari, M. South, G. Vreeswijk, and S. Willmott. Towards an argument interchange format. Knowledge Eng. Review, 21(4):293–316, 2006.10.1017/S0269888906001044Search in Google Scholar
9. O. Christ. A modular and flexible architecture for an integrated corpus query system. In Papers in Computational Lexicography, COMPLEX 1994, pp. 22–32, 1994.Search in Google Scholar
10. C. Cîrstea, A. Kurz, D. Pattinson, L. Schröder, and Y. Venema. Modal logics are coalgebraic. Comput. J., 54:31–41, 2011.10.14236/ewic/VOCS2008.12Search in Google Scholar
11. J. Cohen. A coefficient of agreement for nominal scales. Educ. Psychol. Meas., 20:37–46, 1960.10.1177/001316446002000104Search in Google Scholar
12. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: an architecture for development of robust HLT applications. In Annual Meeting of the Association for Computational Linguistics, ACL 2002, pp. 168–175, 2002.10.3115/1073083.1073112Search in Google Scholar
13. M. Dusmanu, E. Cabrio, and S. Villata. Argument mining on Twitter: Arguments, facts and sources. In Empirical Methods in Natural Language Processing, EMNLP 2017, pp. 2317–2322. ACL, 2017.10.18653/v1/D17-1245Search in Google Scholar
14. N. Dykes, S. Evert, M. Göttlinger, P. Heinrich, and L. Schröder. Reconstructing arguments from noisy text: Introduction to the RANT project. Datenbank-Spektrum, 20:123–129, 2020.10.1007/s13222-020-00342-ySearch in Google Scholar
15. S. Evert and A. Hardie. Twenty-first century corpus workbench: Updating a query architecture for the new millennium. In Corpus Linguistics, CL 2011. University of Birmingham, 2011.Search in Google Scholar
16. S. Evert and The CWB Development Team. The IMS Open Corpus Workbench (CWB) CQP Query Language Tutorial, 2020. CWB Version 3.5, available at http://cwb.sourceforge.net/documentation.php.Search in Google Scholar
17. V. Feng and G. Hirst. Classifying arguments by scheme. In Annual Meeting of the Association for Computational Linguistics, ACL 2011, pp. 987–996. ACL, 2011.Search in Google Scholar
18. J. Fleiss, J. Cohen, and B. Everitt. Large sample standard errors of kappa and weighted kappa. Psychol. Bull., 72(5):323–327, 1969.10.1037/h0028106Search in Google Scholar
19. L. Godo and R. Rodríguez. Logical approaches to fuzzy similarity-based reasoning: an overview. In Preferences and Similarities, pp. 75–128. Springer, 2008.10.1007/978-3-211-85432-7_4Search in Google Scholar
20. D. Gorín, D. Pattinson, L. Schröder, F. Widmann, and T. Wißmann. COOL – a generic reasoner for coalgebraic hybrid logics (system description). In Automated Reasoning, IJCAR 2014, vol. 8562 of LNCS, pp. 396–402. Springer, 2014.10.1007/978-3-319-08587-6_31Search in Google Scholar
21. T. Goudas, C. Louizos, G. Petasis, and V. Karkaletsis. Argument extraction from news, blogs, and social media. In Artificial Intelligence: Methods and Applications, SETN 2014, pp. 287–299. Springer, 2014.10.1007/978-3-319-07064-3_23Search in Google Scholar
22. K. Grosse, C. Chesñevar, A. Maguitman, and E. Estevez. Empowering an E-government platform through Twitter-based arguments. Inteligencia Artif., 15(50):46–56, 2012.Search in Google Scholar
23. S. Kraus, D. Lehmann, and M. Magidor. Nonmonotonic reasoning, preferential models and cumulative logics. Artif. Intell., 44(1-2):167–207, 1990.10.1016/0004-3702(90)90101-5Search in Google Scholar
24. A. Kurucz, F. Wolter, M. Zakharyaschev, and D. M. Gabbay. Many-Dimensional Modal Logics: Theory and Applications. Elsevier, 2003.Search in Google Scholar
25. J. Lawrence, M. Snaith, B. Konat, K. Budzynska, and C. Reed. Debating technology for dialogical argument: Sensemaking, engagement, and analytics. ACM Trans. Internet Tech., 17(3):1–23, 2017.10.1145/3007210Search in Google Scholar
26. M. Lenz, S. Ollinger, P. Sahitaj, and R. Bergmann. Semantic textual similarity measures for case-based retrieval of argument graphs. In Case-Based Reasoning Research and Development, ICCBR 2019, vol. 11680 of LNCS, pp. 219–234. Springer, 2019.10.1007/978-3-030-29249-2_15Search in Google Scholar
27. D. Lewis. Counterfactuals. Harvard University Press, 1973.Search in Google Scholar
28. A. Lytos, T. Lagkas, P. Sarigiannidis, and K. Bontcheva. The evolution of argumentation mining: From models to social media and emerging tools. Inf. Process. Manage., 56(6):102055, 11 2019.10.1016/j.ipm.2019.102055Search in Google Scholar
29. S. Mac Lane. Categories for the Working Mathematician. Springer, 1971.10.1007/978-1-4612-9839-7Search in Google Scholar
30. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, 1301.3781, 2013.Search in Google Scholar
31. G. Minnen, J. Carroll, and D. Pearce. Applied morphological processing of English. Nat. Lang. Eng., 7(3):207–223, 2001.10.1017/S1351324901002728Search in Google Scholar
32. O. Owoputi, B. O’Connor, C. Dyer, K. Gimpel, N. Schneider, and N. Smith. Improved part-of-speech tagging for online conversational text with word clusters. In Human Language Technologies, HLT-NAACL 2013, pp. 380–390. ACL, 2013.Search in Google Scholar
33. P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Computational Linguistics / Annual Meeting of the Association for Computational Linguistics, ACL 2006. ACL, 2006.10.3115/1220175.1220190Search in Google Scholar
34. C. Reed, S. Wells, J. Devereux, and G. Rowe. AIF+: dialogue in the argument interchange format. In Computational Models of Argument, COMMA 2008, vol. 172 of Frontiers Artif. Intell. Appl., pp. 311–323. IOS Press, 2008.Search in Google Scholar
35. N. Reimers, B. Schiller, T. Beck, J. Daxenberger, C. Stab, and I. Gurevych. Classification and clustering of arguments with contextualized word embeddings. In Annual Meeting of the Association for Computational Linguistics, ACL 2019, pp. 567–578. ACL, 2019.10.18653/v1/P19-1054Search in Google Scholar
36. A. Ritter, S. Clark, Mausam, and O. Etzioni. Named entity recognition in tweets: An experimental study. In Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 1524–1534. ACL, 2011.Search in Google Scholar
37. A. Ritter, Mausam, O. Etzioni, and S. Clark. Open domain event extraction from twitter. In Knowledge Discovery and Data Mining, KDD 2012, pp. 1104–1112. ACM, 2012.10.1145/2339530.2339704Search in Google Scholar
38. J. Rutten. Universal coalgebra: A theory of systems. Theor. Comput. Sci., 249:3–80, 2000.10.1016/S0304-3975(00)00056-6Search in Google Scholar
39. L. Schröder and D. Pattinson. Modular algorithms for heterogeneous modal logics via multi-sorted coalgebra. Math. Struct. Comput. Sci., 21(2):235–266, 2011.10.1017/S0960129510000563Search in Google Scholar
40. L. Schröder, D. Pattinson, and D. Hausmann. Optimal tableaux for conditional logics with cautious monotonicity. In European Conference on Artificial Intelligence, ECAI 2010, vol. 215 of Frontiers Artif. Intell. Appl., pp. 707–712. IOS Press, 2010.Search in Google Scholar
41. F. Schäfer, S. Evert, and P. Heinrich. Japan’s 2014 general election: Political bots, right-wing Internet activism and PM Abe Shinzō’s hidden nationalist agenda. Big Data, 5(4):294–309, 2017.10.1089/big.2017.0049Search in Google Scholar
42. Y. Son, A. Buffone, J. Raso, A. Larche, A. Janocko, K. Zembroski, H. A. Schwartz, and L. Ungar. Recognizing counterfactual thinking in social media texts. In Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2017.10.18653/v1/P17-2103Search in Google Scholar
43. D. Walton, C. Reed, and F. Macagno. Argumentation Schemes. Cambridge University Press, 2008.10.1017/CBO9780511802034Search in Google Scholar
44. L. Zadeh. Probability measures of fuzzy events. J. Math. Anal. Appl., 23:421–427, 1968.10.1016/0022-247X(68)90078-4Search in Google Scholar
© 2021 Walter de Gruyter GmbH, Berlin/Boston