Abstract
This paper shows how named entity extraction and network analysis can be used to examine biographies individually and in groups to aid historians in biographical and prosopographical research. For this purpose a reference network of 13 100 biographies in the collections of the Biographical Centre of the Finnish Literature Society was created, based on links between the biographies as well as automatically extracted named entities found in the texts. The data was published in a SPARQL endpoint as a Linked Data knowledge graph on top of which network analytic tools were created and analysis were done showing the usefulness of the approach in Digital Humanities. The reference graph has been utilized for network analysis to examine egocentric networks of individual persons as well as networks among groups of people in prosopography. The data and tools presented are in use since autumn 2018 in the semantic portal BiographySampo that has had tens of thousands of users.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Online at www.biografiasampo.fi; see project homepage https://seco.cs.aalto.fi/projects/biografiasampo/en/ for further info and publications.
- 2.
Prosopography is a method that is used to study groups of people through their biographical data. The goal of prosopography is to find connections, trends, and patterns from these groups.
- 3.
Actually, the biographies in our case study come from several separate databases, including the general National Biography of Finland as a core, supplemented with four other thematic dictionaries [16].
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
Denoted with prefix nbf.
- 11.
- 12.
The view currently lists only sentences that contain manually added HTML links.
- 13.
- 14.
- 15.
- 16.
References
Aylett, R.S., Bental, D.S., Stewart, R., Forth, J., Wiggins, G.: Supporting serendipitous discovery. In: Digital Futures (Third Annual Digital Economy Conference), Aberdeen, UK, 23–25 October 2012 (2012)
Borin, L., Forsberg, M., Roxendal, J.: Korp – the corpus infrastructure of Språkbanken. In: Proceedings of LREC 2012, Istanbul: ELRA, pp. 474–478 (2012)
Brouwer, J., Nijboer, H.: Golden agents. A web of linked biographical data for the Dutch Golden Age. In: BD2017 Biographical Data in a Digital World 2017, Proceedings, vol. 2119, pp. 33–38. CEUR Workshop Proceedings (2018). https://ceur-ws.org/Vol-2119/paper6.pdf
Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: EACL 2006, 11st Conference of the European Chapter of the Association for Computational Linguistics, vol. 6, pp. 9–16 (2006)
Elson, D.K., Dames, N., McKeown, K.R.: Extracting social networks from literary fiction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 138–147. Association for Computational Linguistics (2010)
Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1625–1628. ACM (2010)
Hachey, B., Radford, W., Nothman, J., Honnibal, M., Curran, J.R.: Evaluating entity linking with Wikipedia. Artif. Intell. 194, 130–150 (2013)
Hakosalo, H., Jalagin, S., Junila, M., Kurvinen, H.: Historiallinen elämä - Biografia ja historiantutkimus. Suomalaisen Kirjallisuuden Seura (SKS), Helsinki (2014)
Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology. Morgan & Claypool (2011)
Heino, E., et al.: Named entity linking in a complex domain: case second world war history. In: Gracia, J., Bond, F., McCrae, J.P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) LDK 2017. LNCS (LNAI), vol. 10318, pp. 120–133. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59888-8_10
Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 98–113. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_7
Hyvönen, E.: Publishing and Using Cultural Heritage Linked Data on the Semantic Web. Morgan & Claypool, Palo Alto (2012)
Hyvönen, E., et al.: WarSampo data service and semantic portal for publishing linked open data about the second world war history. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 758–773. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34129-3_46
Hyvönen, E., Ikkala, E., Tuominen, J.: Linked data brokering service for historical places and maps. In: Proceedings of the 1st Workshop on Humanities in the Semantic Web (WHiSe), vol. 1608, pp. 39–52. CEUR Workshop Proceedings (2016). https://ceur-ws.org/Vol-1608/paper-06.pdf
Hyvönen, E., et al.: BiographySampo – publishing and enriching biographies on the semantic web for digital humanities research. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 574–589. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_37
Hyvönen, E., Leskinen, P., Tamper, M., Tuominen, J., Keravuori, K.: Semantic national biography of Finland. In: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018), vol. 2084, pp. 372–385. CEUR Workshop Proceedings (2018). https://ceur-ws.org/Vol-2084/short12.pdf
Ikkala, E., Tuominen, J., Hyvönen, E.: Contextualizing historical places in a gazetteer by using historical maps and linked data. In: Proceedings of Digital Humanities 2016 (DH 2016), Krakow, Poland, pp. 573–577 (2016). https://dh2016.adho.org/abstracts/39
Kettunen, K., Mäkelä, E., Ruokolainen, T., Kuokkala, J., Löfberg, L.: Old content and modern tools-searching named entities in a Finnish OCRed historical newspaper collection 1771–1910. arXiv preprint arXiv:1611.02839 (2016)
Langmead, A., Otis, J., Warren, C., Weingart, S., Zilinski, L.: Towards interoperable network ontologies for the digital humanities. Int. J. Hum. Arts Comput. 10, 22–35 (2016)
Leskinen, P., Hyvönen, E.: Extracting genealogical networks of linked data from biographical texts. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11762, pp. 121–125. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32327-1_24
Leskinen, P., Hyvönen, E., Tuominen, J.: Analyzing and visualizing prosopographical linked data based on biographies. In: BD2017 Proceedings of the Second Conference on Biographical Data in a Digital World 2017, vol. 2119, pp. 39–44. CEUR Workshop Proceedings (2018). https://ceur-ws.org/Vol-2119/paper7.pdf
Lindquist, T., Long, H.: How can educational technology facilitate student engagement with online primary sources? A user needs assessment. Libr. Hi Tech 29(2), 224–241 (2011)
Mäkelä, E.: Combining a REST lexical analysis web service with SPARQL for mashup semantic annotation from text. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8798, pp. 424–428. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11955-7_60
Mäkelä, E., Lindquist, T., Hyvönen, E.: CORE - a contextual reader based on linked data. In: Proceedings of Digital Humanities 2016, Krakow, Poland, pp. 267–269 (2016). https://dh2016.adho.org/abstracts/4
Maynard, D., Roberts, I., Greenwood, M.A., Rout, D., Bontcheva, K.: A framework for real-time semantic social media analysis. J. Web Semant. 44, 75–88 (2017)
Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)
Newman, M.: Networks. Oxford University Press, Oxford (2018)
Nguyen, D.B., Hoffart, J., Theobald, M., Weikum, G.: AIDA-light: high-throughput named-entity disambiguation. In: Proceedings of LDOW, Linked Data on the Web, vol. 1184. CEUR Workshop Proceedings (2014). https://ceur-ws.org/Vol-1184/ldow2014_paper_03.pdf
Oksanen, A., Tuominen, J., Mäkelä, E., Tamper, M., Hietanen, A., Hyvönen, E.: Semantic Finlex: transforming, publishing, and using Finnish legislation and case law as linked open data on the web. In: Knowledge of the Law in the Big Data Age. Frontiers in Artificial Intelligence and Applications, vol. 317, pp. 212–228. IOS Press (2019)
Otte, E., Rousseau, R.: Social network analysis: a powerful strategy, also for the information sciences. J. Inf. Sci. 28(6), 441–453 (2002)
Pattuelli, M.C., Miller, M., Lange, L., Thorsen, H.K.: Linked Jazz 52nd street: a LOD crowdsourcing tool to reveal connections among Jazz artists. In: Proceedings of Digital Humanities 2013, pp. 337–339 (2013)
Piccinno, F., Ferragina, P.: From TagME to WAT: a new entity annotator. In: Proceedings of the First International Workshop on Entity Recognition & Disambiguation, pp. 55–62. ACM (2014)
Roberts, B.: Biographical Research. Understanding Social Research. Open University Press (2002)
Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24(4), 265–269 (1973)
Tamper, M., Leskinen, P., Apajalahti, K., Hyvönen, E.: Using biographical texts as linked data for prosopographical research and applications. In: Ioannides, M., et al. (eds.) EuroMed 2018. LNCS, vol. 11196, pp. 125–137. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01762-0_11
Tuominen, J., Hyvönen, E., Leskinen, P.: Bio CRM: a data model for representing biographical data for prosopographical research. In: Biographical Data in a Digital World 2017, Proceedings, vol. 2119. CEUR Workshop Proceedings (2018). https://ceur-ws.org/Vol-2119/paper7.pdf
Verboven, K., Carlier, M., Dumolyn, J.: A short manual to the art of prosopography. In: Prosopography Approaches and Applications. A Handbook, pp. 35–70. Unit for Prosopographical Research (Linacre College) (2007)
Warren, C.N., Shore, D., Otis, J., Wang, L., Finegold, M., Shalizi, C.: Six degrees of francis bacon: a statistical method for reconstructing large historical social networks. DHQ: Digit. Hum. Q. 10(3) (2016)
Acknowledgments
Our research was part of the Severi project (http://seco.cs.aalto.fi/projects/severi), funded mainly by Business Finland. Thanks to Mikko Kivelä for inspirational discussions and CSC - IT Center for Science for computational resources.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Tamper, M., Leskinen, P., Hyvönen, E. (2023). Visualizing and Analyzing Networks of Named Entities in Biographical Dictionaries for Digital Humanities Research. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-24337-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)