Computational authorship attribution in medieval Latin corpora: the case of the Monk of Lido (ca. 1101–08) and Gallus Anonymous (ca. 1113–17) | Language Resources and Evaluation Skip to main content
Log in

Computational authorship attribution in medieval Latin corpora: the case of the Monk of Lido (ca. 1101–08) and Gallus Anonymous (ca. 1113–17)

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper applies computational methods of authorship attribution to shed light on a still open question concerning two Latin works of the twelfth century: are the anonymous authors of the Translatio s. Nicolai (ca. 1101–1108) and the Gesta principum polonorum (ca. 1113–1117) one and the same person? The Translatio was written by the so-called Monk of Lido and describes Venice’s role in the First Crusade. The Gesta were written by the so-called Gallus Anonymous and contain a panegyric of the contemporary Polish ruler, Bolesław III the Wry-Mouthed (r. 1102–1138). This study attributes authorship to these works within four corpora of Latin texts composed between the tenth and twelfth centuries, each with between 39 and 116 texts written by between 15 and 22 different authors. The goal of including four corpora is to see how robust the similarity between the target texts is to changes in text length, genre, and class balance in the corpora. In each corpus, nine different distance metrics and one machine-learning algorithm are used to classify the authors of the Translatio and Gesta. I conclude that it is highly likely that Gallus and Monk were indeed one and same anonymous author, and highlight the effectiveness of the Bray–Curtis distance and logistic regression as methods of attribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Monk of Lido (1895), henceforth referred to as Translatio. The edition of an abridged twelfth-century copy of the Translatio, discovered well after this critical edition was published, may be consulted in Riedmann (1986), pp. 359–364. A German translation of the eleven miracle stories concluding the Translatio can be consulted in Seeger (2005), pp. 254–287.

  2. Gallus Anonymous (1952), henceforth referred to as Gesta. English translation in Gallus Anonymous (2003).

  3. The term was popularized by Charles Homer Haskins in his The Renaissance of the Twelfth Century (1927).

  4. For a study of these historical implications see Kabala (forthcoming).

  5. According to Kromer, “A Frenchman [Gallus] wrote this history—some monk, I imagine, who lived during the time of Bolesław III, as can be gathered from the prologue.” See Knoll, Schaer and Polak, “Editor’s Introduction,” xxv, in Gallus Anonymous (2003), for a photographic reproduction of Kromer’s note: “Gallus hanc historiam scripsit, monachus, opinor, aliquis, ut ex proemiis coniicere licet qui Boleslai tertii tempore vixit.”

  6. For an overview of theories, see Mühle (2009), pp. 464–475 and Bagi (2008), pp. 15–23. Bagi organized the work on Gallus’s identity into three broad approaches emphasizing a Provençal-Hungarian career, northern French/Walloon origins, and Italian origins. To these can be added a fourth and more recent group of scholars, who have sought Gallus’s origins in Germany (Fried 2009; Wenta 2011).

  7. Jasiński designed programs to detect high-frequency occurrences of the kinds of rare cursus cadences preferred by Gallus in consecutive chunks of Latin sources drawn from his personal database of medieval Latin texts.

  8. Eder used two reference corpora of Latin texts—one small (9 texts), the other large (159 texts).

  9. Jasiński has not published or documented his personal database of Latin texts, which he describes as containing 900,000 pages of Latin (Jasiński 2014, p. 256), or “the majority of all of Latin writing to the end of the twelfth century” (Jasiński 2016, p. 157; my translation). While he does describe the functionality of the computer programs used to search this database, these too remain unavailable for verification. Similarly, Eder described his larger reference corpus without listing all its members or identifying the metrics used to determine similarities between them (Eder 2015). Elsewhere, he provides a general description of the algorithms used as well as a link to his software package (Eder 2017).

  10. Part of Fried’s traditional philological critique of Jasiński’s (2008) book was: “Eine Gegenprobe, wieweit nämlich die angeführten Parallelen auch bei anderen Autoren anzutreffen sind, wäre geboten.”

  11. The one text treated with OCR software is the Monk of Lido’s Translatio.

  12. e.g. in Geoffrey of Malaterra’s De rebus gestis Rogerii et Roberti Guiscardi, Liutprand of Cremona’s Antapodosis, and Suger’s Vita Ludovici. I have not removed chapter or book titles or numbers in the text itself, which introduces some noise into the data: as certain editions list chapter titles within the text while others give only chapter numbers, the count of ‘de’ (‘about’), a frequent word in chapter titles, will be higher in the former than in the latter.

  13. e.g. in Otloh of St. Emmeram’s Liber de temptationibus, Abelard’s Dialogus, and Anselm’s Monologion. Additional traces of editorial noise remain, as certain works contain brief editorial notes before every section of text (e.g. all six works of Guibert of Nogent).

  14. e.g. ‘Ans.’ and ‘Bos.’ in Anselm’s Cur Deus Homo; ‘Homo’ and ‘Anima’ in Hugh of St. Victor’s Soliloquium; ‘Philosophus,’ ‘Christian.,’ ‘Judaeus,’ and ‘Judex’ in Abelard’s Dialogus inter philosophum, Judaeum et christianum; and ‘Judaeus’ and ‘Christianus’ in Rupert of Deutz’s Annulus sive dialogus inter Christianum et Judaeum.

  15. e.g. chapters 19–22 of Otloh of St. Emmeram’s Liber visionum, which were written by Boniface and Bede centuries earlier. I have retained only Books 1–2 of Otto of Freising’s Gesta Friderici imperatoris, i.e. only the portion written by Otto, and not by Rahewin.

  16. e.g. throughout the three books of Gallus’s Gesta or the opening to John of Salisbury’s Polycraticus.

  17. A previous computational study of the Gesta and Translatio relied on a small set of 9 Latin texts composed close in time to Gallus and Monk’s works, and a larger set of 159 Latin texts spanning 16 centuries from Antiquity to the Reformation (Eder 2015).

  18. Experiments with various segment length targets above and below the length of the Translatio showed that a target length of 10,500 words yielded the least variance in segment length.

  19. See Cha (2007) and Wolfram Mathematica documentation (http://reference.wolfram.com/language/) for definitions of the first eight distances. See Koppel and Winter (2014) for the definition of the min–max distance.

  20. All files are available at www.jakubkabala.com/gallus-monk/. As many of the experiments in this study rely on random subsamplings of words in the corpora, subsequent runs of this code will produce results slightly different from the ones reported here.

  21. As expected, the min–max and Bray–Curtis metrics placed the first half of the Gesta closest to the second half of the Gesta in 10 out of 10 subsample-based VSMs of corpus D, and in at least 9 out of the 10 subsample-based VSMs of both corpora B and C (Fig. 5); the results are even more consistent for Gesta2 (Fig. 6).

  22. NB: Wolfram’s implementation of logistic regression reports second-most likely attributions only when the probability of such an attribution is sufficiently high. See Wolfram documentation for details.

  23. Wincenty Lutosławski formulated his “law of stylistic affinity” in 1897. He set out to demonstrate the order in which Plato wrote his dialogues through a meticulous counting and study of the frequencies of the Greek particles in each one. His “law” anticipated the results of modern psychological research cited earlier in this article: “Of two works of the same author and of the same size, that is nearer in time to a third, which shares with it the greater number of stylistic peculiarities, provided that their different importance is taken into account, and that the number of observed peculiarities is sufficient to determine the stylistic character of all the three works… If we now ask how the law of stylistic affinity can be verified, the first and nearest answer lies in the psychological property of style as a mark of identity, entirely depending on the totality of familiar expressions at any time in the writer’s consciousness. Every writer could find easily in his own experience sufficient evidence in favour of this psychological law” (emphasis in original). See Lutosławski (1897), pp. 152–153.

References

  • Abbasi, A., & Chen, H. (2005). Applying authorship analysis to extremist-group web forum messages. IEEE Intelligent Systems,20, 67–75.

    Google Scholar 

  • Adamska, A. (2000). ‘From memory to written record’ in the periphery of medieval Latinitas: The case of Poland in the eleventh and twelfth centuries. In K. J. Heidecker (Ed.), Charters and the use of the written word in medieval society (pp. 83–100). Turnhout: Brepols.

    Google Scholar 

  • Angold, M., & Balard, M. (2007). Venice: A Bibliography. In M. Whitby (Ed.), Byzantines and crusaders in non-Greek sources, 1025–1204 (pp. 86–94). Oxford: Oxford University Press.

    Google Scholar 

  • Argamon, S. (2007). Interpreting Burrows’s delta: Geometric and probabilistic foundations. Literary and Linguistic Computing,23, 131–147.

    Google Scholar 

  • Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM,52, 119–123.

    Google Scholar 

  • Argamon, S., & Levitan, S. (2005). Measuring the usefulness of function words for authorship attribution. In Proceedings of the 2005 ACH/ALLC conference. Victoria, BC, Canada, June 2005.

  • Baayen, H., Van Halteren, H., & Tweedie, F. (1996). Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing,11, 121–132.

    Google Scholar 

  • Bagi, D. (2008). Królowie węgierscy w Kronice Galla Anonima. Cracow: Polska Akademia Umiejętności.

    Google Scholar 

  • Benedetto, D., Degli Esposti, M., & Maspero, G. (2013). The puzzle of Basil’s Epistula 38: A mathematical approach to a philological problem. Journal of Quantitative Linguistics,20, 267–287.

    Google Scholar 

  • Binongo, J. N. G. (2003). Who wrote the 15th book of Oz? An application of multivariate analysis to authorship attribution. Chance,16, 9–17.

    Google Scholar 

  • Binongo, J. N. G., & Smith, M. W. (1999). The application of principal component analysis to stylometry. Literary and Linguistic Computing,14, 445–466.

    Google Scholar 

  • Borawska, D. (1965). Gallus Anonim czy Italus Anonim. Przegląd Historyczny,56(1), 111–119.

    Google Scholar 

  • Boyle, L. E. (1992). Diplomatics. In J. M. Powell (Ed.), Medieval Studies: An Introduction (pp. 82–113). Syracuse, N.Y.: Syracuse University Press.

    Google Scholar 

  • Burrows, J. F. (1987). Word-patterns and story-shapes: The statistical analysis of narrative style. Literary and Linguistic Computing,2, 61–70.

    Google Scholar 

  • Burrows, J. F. (1989). “An ocean where each kind…”: Statistical analysis and some major determinants of literary style. Computers and the Humanities,23, 309–321.

    Google Scholar 

  • Burrows, J. F. (1992a). Computers and the study of literature. In C. Butler (Ed.), Computers and written texts: An applied perspective (pp. 167–204). Oxford: Blackwell.

    Google Scholar 

  • Burrows, J. F. (1992b). Not unless you ask nicely: The interpretative nexus between analysis and information. Literary and Linguistic Computing,7, 91–109.

    Google Scholar 

  • Burrows, J. F. (2002). “Delta”: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing,17, 267–287.

    Google Scholar 

  • Burrows, J. F., & Hassall, A. J. (1988). Anna Boleyn and the authenticity of Fielding’s feminine narratives. Eighteenth-Century Studies,21, 427–453.

    Google Scholar 

  • Cha, S.-H. (2007). Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences,1(4), 300–307.

    Google Scholar 

  • Coleman, C. B. (1922). The treatise of Lorenzo Valla on the Donation of Constantine: Text and translation into English. New Haven: Yale University Press.

    Google Scholar 

  • Diederich, J., Kindermann, J., Leopold, E., & Paass, G. (2003). Authorship attribution with support vector machines. Applied Intelligence,19, 109–123.

    Google Scholar 

  • Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the seventh international conference on information and knowledge management (pp. 148–155). Bethesda, MD, November 1998.

  • Eder, M. (2013). Does size matter? Authorship attribution, small samples, big problem. Digital Scholarship in the Humanities,30, 167–182.

    Google Scholar 

  • Eder, M. (2015). In search of the author of Chronica Polonorum ascribed to Gallus Anonymus: A stylometric reconnaissance. Acta Poloniae Historica,112, 5–23.

    Google Scholar 

  • Eder, M. (2017). Visualization in stylometry: Cluster analysis using networks. Digital Scholarship in the Humanities, 37(1), 50–64.

    Google Scholar 

  • Evert, S., Proisl, T., Jannidis, F., Reger, I., Pielström, S., Schöch, C., et al. (2017). Understanding and explaining delta measures for authorship attribution. Digital Scholarship in the Humanities, 32(Supplement 2), 4–16.

    Google Scholar 

  • Fried, J. (2009). Kam der Gallus Anonymus aus Bamberg? Archiv für Erforschung des Mittelalters,65, 497–546.

    Google Scholar 

  • Gallus Anonymous. (1952). Galli Anonymi cronicae et gesta ducum sive principum polonorum: Anonima tzw. Galla kronika czyli dzieje książąt i władców polskich. Edited by K. Maleczyński. Cracow: Polska Akademia Umiejętności.

  • Gallus Anonymous. (2003). Gesta principum Polonorum: The deeds of the princes of the Poles. Translated by P.W. Knoll & F. Schaer. Budapest: Central European University Press.

    Google Scholar 

  • Genkin, A., Lewis, D. D., & Madigan, D. (2006). Large-scale Bayesian logistic regression for text categorization. Technometrics,49(3), 291–304.

    Google Scholar 

  • Grieve, J. (2007). Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing,22, 251–270.

    Google Scholar 

  • Haskins, C. H. (1927). The renaissance of the twelfth century. Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Holmes, D. I. (1994). Authorship attribution. Computers and the Humanities,28, 87–106.

    Google Scholar 

  • Holmes, D. I. (1998). The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing,13, 111–117.

    Google Scholar 

  • Holmes, D. I., & Forsyth, R. S. (1995). The Federalist revisited: New directions in authorship attribution. Literary and Linguistic Computing,10, 111–127.

    Google Scholar 

  • Holmes, D. I., Gordon, L. J., & Wilson, C. (2001a). A widow and her soldier: Stylometry and the American civil war. Literary and Linguistic Computing,16, 403–420.

    Google Scholar 

  • Holmes, D. I., Robertson, M., & Paez, R. (2001b). Stephen Crane and the ‘New-York Tribune’: A case study in traditional and non-traditional authorship attribution. Computers and the Humanities,35, 315–331.

    Google Scholar 

  • Hoover, D. (2004). Testing Burrows’s delta. Literary and Linguistic Computing,19, 453–475.

    Google Scholar 

  • Hoover, D. (2006). Stylometry, chronology and the styles of Henry James. In Proceedings of digital humanities 2006 (pp. 78–80). Paris, July 2006.

  • Houvardas, J., & Stamatatos, E. (2006). N-gram feature selection for authorship identification. In Proceedings of the 12th international conference on artificial intelligence: Methodology, systems, and applications (pp. 77–86). Varna, Bulgaria, September 2006.

  • Janson, T. (1975). Prose rhythm in medieval Latin from the 9th to the 13th century. Stockholm: Almqvist & Wiksell International.

    Google Scholar 

  • Jasiński, T. (2008). O pochodzeniu Galla Anonima. Cracow: Avalon.

    Google Scholar 

  • Jasiński, T. (2011). Kronika Polska Galla Anonima w świetle unikatowej analizy komputerowej nowej generacji. Poznań: Instytut Historii UAM.

    Google Scholar 

  • Jasiński, T. (2014). Informatyka w służbie filologii łacińskiej i historii. Rocznik Biblioteki Narodowej,45, 243–265.

    Google Scholar 

  • Jasiński, T. (2016). Gall Anonim—poeta i mistrz prozy. Cracow: Avalon.

    Google Scholar 

  • Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European conference on machine learning (pp. 137–142). Chemnitz, Germany, April 1998.

  • Jockers, M. L., & Witten, D. M. (2010). A comparative study of machine learning methods for authorship attribution. Literary and Linguistic Computing,25, 215–223.

    Google Scholar 

  • John, J. J. (1992). Latin palaeography. In J. M. Powell (Ed.), Medieval Studies: An Introduction (pp. 3–81). Syracuse, N.Y.: Syracuse University Press.

    Google Scholar 

  • Juola, P. (2007). Becoming Jack London. Journal of Quantitative Linguistics,14, 145–147.

    Google Scholar 

  • Juola, P. (2008). Authorship attribution. Foundations and Trends in Information Retrieval,1(3), 233–334.

    Google Scholar 

  • Juola, P. (2015). The Rowling case: A proposed standard analytic protocol for authorship questions. Digital Scholarship in the Humanities,30(Supplement 1), 100–113.

    Google Scholar 

  • Kabala, J. (forthcoming). Gallus Anonymous vel Monk of Lido (fl. ca. 1101–1117): Writer for hire in twelfth-century Europe. The Medieval Globe, 5.

  • Kešelj, V., Peng, F., Cercone, N., & Thomas, C. (2003). N-gram-based author profiles for authorship attribution. In Proceedings of the conference of the Pacific association for computational linguistics 2003 (pp. 255–264). Halifax, Canada, August 2003.

  • Kestemont, M., Luyckx, K., Daelemans, W., & Crombez, T. (2012). Cross-genre authorship verification using unmasking. English Studies,93, 340–356.

    Google Scholar 

  • Kestemont, M., Moens, S., & Deploige, J. (2015). Collaborative authorship in the twelfth century: A stylometric study of Hildegard of Bingen and Guibert of Gembloux. Digital Scholarship in the Humanities,30, 199–224.

    Google Scholar 

  • Kestemont, M., Stover, J., Koppel, M., Karsdorp, F., & Daelemans, W. (2016). Authenticating the writings of Julius Caesar. Expert Systems with Applications,63, 86–96.

    Google Scholar 

  • Knowles, D. (1964). Great historical enterprises: Problems in monastic history. London: Thomas Nelson and Sons.

    Google Scholar 

  • Koppel, M., & Schler, J. (2004). Authorship verification as a one-class classification problem. In Proceedings of the twenty-first international conference on machine learning. Banff, Canada, 2004.

  • Koppel, M., Schler, J., & Argamon, S. (2009). Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology,60, 9–26.

    Google Scholar 

  • Koppel, M., Schler, J., & Argamon, S. (2011). Authorship attribution in the wild. Language Resources and Evaluation,45, 83–94.

    Google Scholar 

  • Koppel, M., Schler, J., & Bonchek-Dokow, E. (2007). Measuring differentiability: Unmasking pseudonymous authors. Journal of Machine Learning Research,8, 1261–1276.

    Google Scholar 

  • Koppel, M., & Winter, Y. (2014). Determining if two documents are written by the same author. Journal of the Association for Information Science and Technology,65, 178–187.

    Google Scholar 

  • Labuda, G. (2006). Zamiana Galla-Anonima, autora pierwszej Kroniki dziejów Polski, na Anonima-Wenecjanina. Studia Źródłoznawcze,44, 117–125.

    Google Scholar 

  • Love, H. (2002). Attributing authorship: An introduction. Cambridge: Cambridge University Press.

    Google Scholar 

  • Lutosławski, W. (1897). The origin and growth of Plato’s logic: With an account of Plato’s style and of the chronology of his writings. London: Longmans, Green & Co.

    Google Scholar 

  • Luyckx, K., & Daelemans, W. (2011). The effect of author set size and data size in authorship attribution. Literary and Linguistic Computing,26, 35–55.

    Google Scholar 

  • Madigan, D., Genkin, A., Lewis, D. D., & Fradkin, D. (2005). Bayesian multinomial logistic regression for author identification. In Proceedings of the American institute of physics conference 2005 (pp. 509–516). San Jose, August 2005.

  • Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.

    Google Scholar 

  • Mendenhall, T. C. (1887). The characteristic curves of composition. Science,9, 237–249.

    Google Scholar 

  • Monk of Lido. (1895). Monachi anonymi littorensis Historia de translatione sanctorum Magni Nicolai, terra marique miraculis gloriosi, eiusdem avunculi alterius Nicolai, Theodorique martyris pretiosi, de civitate Mirea in monasterium S. Nicolai de littore Venetiarum. In Recueil des historiens des croisades: historiens occidentaux (Vol. 5, pp. 253–292). Paris: Imprimerie Royale.

  • Mosteller, F., & Wallace, D. L. (1964). Inference and disputed authorship: The Federalist. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Mühle, E. (2009). ‘Cronicae et gesta ducum sive principum Polonorum’: Neue Forschungen zum so genannten Gallus Anonymus. Archiv für Erforschung des Mittelalters,65, 459–496.

    Google Scholar 

  • Nicol, D. M. (1988). Byzantium and Venice: A study in diplomatic and cultural relations. Cambridge: Cambridge University Press.

    Google Scholar 

  • Norberg, D. L. (1980). Manuel pratique de latin médiéval. Paris: A. et J. Picard.

    Google Scholar 

  • Pennebaker, J. W. (2011). The secret life of pronouns: What our words say about us. New York: Bloomsbury.

    Google Scholar 

  • Pennebaker, J. W., & Stone, L. D. (2003). Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology,85, 291–301.

    Google Scholar 

  • Plezia, M. (1984). Nowe studia nad Gallem-Anonimem. In H. Chłopocka & B. Kürbis (Eds.), Mente et litteris: o kulturze i społeczeństwie wieków średnich (pp. 111–120). Poznań: Uniwersytet im. Adama Mickiewicza.

    Google Scholar 

  • Pranckevičius, T., & Marcinkevičius, V. (2016). Application of logistic regression with part-of-the-speech tagging for multi-class text classification. In Proceedings of the IEEE 4th workshop on advances in information, electronic and electrical engineering (pp. 1–5). Vilnius, Lithuania, November 2016.

  • Riedmann, J. (1986). Eine Überlieferung der ‘Translatio sancti Nicolai’ aus dem 12. Jahrhundert im Tiroler Landesarchiv Innsbruck. In K. Ebert (Ed.), Festschrift Nikolaus Grass: Zum 70. Geburtstag dargebracht von Fachkollegen und Freunden (pp. 349–364). Innsbruck: Universitätsverlag Wagner.

    Google Scholar 

  • Rudman, J. (1998). The state of authorship attribution studies: Some problems and solutions. Computers and the Humanities,31, 351–365.

    Google Scholar 

  • Sanderson, C., & Guenter, S. (2006). Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation. In Proceedings of the conference on empirical methods in natural language processing (pp. 482–491). Sydney, Australia, July 2006.

  • Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys,34, 1–47.

    Google Scholar 

  • Seeger, S. (2005). Der heilige Nikolaus von Myra und die Historia de translatione sanctorum magni Nicolai, alterius Nicolai Theodorique martyris (nach 1116). In K. Herbers, L. Jiroušková, & B. Vogel (Eds.), Mirakelberichte des frühen und hohen Mittelalters (pp. 254–287). Darmstadt: Wissenschaftliche Buchgesellschaft.

    Google Scholar 

  • Stamatatos, E. (2008). Author identification: Using text sampling to handle the class imbalance problem. Information Processing and Management, 44(2), 790–799.

    Google Scholar 

  • Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology,60, 538–556.

    Google Scholar 

  • Stover, J. A., Winter, Y., Koppel, M., & Kestemont, M. (2016). Computational authorship verification method attributes a new work to a major 2nd century African author. Journal of the Association for Information Science and Technology,67, 239–242.

    Google Scholar 

  • Wenta, J. (2011). Kronika tzw. Galla Anonima: Historyczne (monastyczne i genealogiczne) oraz geograficzne konteksty powstania. Toruń: Uniwersytet Mikołaja Kopernika.

    Google Scholar 

  • Wieczorek, S. (2010). ‘Omnibus omnia factus sum’. Na marginesie książki Tomasza Jasińskiego O pochodzeniu Galla Anonima. Kwartalnik Historyczny,117, 87–106.

    Google Scholar 

  • Zhang, T., & Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval,4(1), 5–31.

    Google Scholar 

  • Zhang, J., & Yang, Y. (2003). Robustness of regularized linear classification methods in text categorization. In Proceedings of the 26th ACM SIGIR conference on research and development in information retrieval (pp. 190–197). Toronto, July–August 2003.

  • Zhao, Y., & Zobel, J. (2005). Effective and scalable authorship attribution using function words. In Proceedings of the 2nd Asia information retrieval symposium (pp. 174–189). Jeju Island, Korea, October 2005.

  • Zheng, R., Li, J., Chen, H., & Huang, Z. (2006). A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology,57(3), 378–393.

    Google Scholar 

Download references

Acknowledgements

I would like to thank Michael McCormick, in whose Graduate Research Seminar on medieval computational philology this article was conceived, and Stuart Shieber, for training and feedback in the Initiative for the Science of the Human Past at Harvard. I also thank Zbigniew Kabala for introducing me to Mathematica, sharing literature and offering feedback, as well as Daniel Lichtblau and others at Wolfram Research for their questions and suggestions on an earlier version of this paper. Finally, I would like to acknowledge the two anonymous reviewers for their insightful comments and suggestions, which have significantly improved this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jakub Kabala.

Appendices

Appendix 1: Composition of corpora A, B, and C

The following list represents the composition of corpus A. As explained in the main article, all works in this list provide the chunked text segments for corpus B. Starred items (*) in the below list represent the subset of historiographical works providing text segments for the genre-specific corpus C. Please refer to Sect. 5.1 of the main article for further details about text preparation.

Parenthetical dates in the list refer to the lifetime of the author, when known; otherwise, they refer to the date of composition of the work. Dates are drawn from Wolfgang Buchwald et al., Dictionnaire des auteurs grecs et latins de l’antiquité et du moyen age (Turnhout: Brepols, 1991) and Robert Auty et al., eds. Lexikon des Mittelalters, 9 vols. (Munich: Artemis, 1977–1999). Titles mostly follow those given by Buchwald et al. The following abbreviations are used:

LL:

The Latin Library (www.thelatinlibrary.com)

PL:

Patrologiae cursus completus, series Latina. Ed. J.P. Migne. 221 vols. Paris, 1844–64

RHC-Hist. Occ.:

Recueil des historiens des croisades: historiens occidentaux. 5 vols. Paris, 1844-1895

* Peter Abelard (1079–1142), Historia calamitatum (LL)

————, Dialogus inter philosophum, Judaeum et christianum (PL 178, cols. 1611–1684)

————, Ethica (PL 178, cols. 633–678)

Anselm of Canterbury (1033/34–1109), Cur Deus homo (PL 158, cols. 360–432)

————, Monologion (PL 158, cols. 142–224)

* Cosmas of Prague (c. 1045–1125), Chronicae Bohemorum libri III (PL 166, cols. 55–243)

* Anonymous (after 1187), De expugnatione Terrae Sanctae per Saladinum (LL)

* Galbert of Bruges (1127/28), Passio Caroli, comitis Flandriae (PL 166, cols. 943–1046)

* Gallus Anonymous (c. 1113–c. 1117), Gesta principum polonorum (PL 160, cols. 839–936)

* Anonymous (c. 1101), Gesta Francorum (LL)

Guibert of Nogent (1053–1124), De incarnatione contra Judaeos (PL 156, cols. 489–528)

————, De laude s. Mariae (PL 156, cols. 537–578)

————, De pignoribus sanctorum (PL 156, cols. 607–680)

————, De virginitate (PL 156, cols. 579–608)

* ————, De vita sua sive monodiarum libri III (PL 156, cols. 837–962)

* ————, Gesta Dei per Francos (PL 156, cols. 679–838)

Hugh of St. Victor (c. 1096–1141), Didascalicon (LL)

————, Soliloquium de arrha animae (LL)

John of Salisbury (c. 1115–1180), Metalogicus (PL 199, cols. 823–946)

————, Polycraticus (PL 199, cols. 379–822)

* Liutprand of Cremona (c. 920–c. 972), Antapodosis (PL 136, cols. 789–898)

* ————, Relatio de legatione Constantinopolitana (PL 136, cols. 909–938)

* Geoffrey of Malaterra (by 1101), De rebus gestis Rogerii et Roberti Guiscardi (PL 149, cols. 1099–1210)

* William of Malmesbury (c. 1080–c. 1142), De gestis pontificum Anglorum libri V (PL 179, cols. 1441–1680)

* ————, De gestis regum Anglorum libri V (PL 179, cols. 957–1392)

* ————, Historiae novellae libri III (PL 179, cols. 1391–1440)

* Monk of Lido (after 1101), Translatio s. Nicolai (RHC-Hist. Occ. V, pp. 253–292)

* Otloh of St. Emmeram (c. 1010–c. 1070), Liber de temptationibus (PL 146, cols. 29–58)

* ————, Liber visionum (PL 146, cols. 341–388)

* Otto of Freising (1111/15–1158), Gesta Friderici (LL)

* William of Poitiers (c. 1020–after 1087), Gesta Guilelmi (PL 149, cols. 1217–1270)

* Raoul of Caen (after 1107), Gesta Tancredi (LL)

Rather of Verona (c. 890–974), Praeloquia (PL 136, cols. 145–344)

* Rupert of Deutz (c. 1070–1129/30), De incendio Tuitiensi (PL 170, cols. 333–358)

————, Annulus sive dialogus inter Christianum et Judaeum (PL 170, cols. 559–610)

————, Super quaedam capitula regulae divi Benedicti abbatis (PL 170, cols. 477–538)

* Suger of St. Denis (c. 1081–1151), De rebus in administratione sua gestis (PL 186, cols. 1211–1240)

* ————, Vita Ludovici VI Grossi (PL 186, cols. 1253–1340)

* Thietmar of Merseburg (975–1018), Chronicon (PL 139, cols. 1183–1422)

Appendix 2: Composition of corpus D

Corpus D is a subset of text segments selected from the text segments in corpus C in order to ensure balance among the author classes. With the exception of the Monk of Lido, only those authors with at least 2 text segments in corpus C were retained. Whenever possible, 3 segments were chosen for each of those authors; when the length of the original work had created only 2 segments, those 2 were chosen. If an author had more than one original work, segments were chosen from at least 2 of those works.

The following table summarizes the composition of corpus D:

Author and text

Segment #s

Cosmas, Chronicae

1, 2, 3

Galbert, Passio Caroli

1, 2, 3

Gallus, Gesta principum polonorum

1, 2

Anonymous, Gesta Francorum

1, 2

Guibert, De vita sua

3, 4

Guibert, Gesta Dei per Francos

1

Liutprand, Antapodosis

2, 3

Liutprand, Relatio de legatione

1

Geoffrey, De rebus gestis

1, 2, 3

William of Malmesbury, De gestis pontificum

7, 8

William of Malmesbury, De gestis regum

5

Monk of Lido, Translatio

1

Otloh, Liber de temptationibus

1

Otloh, Liber visionum

1

Otto, Gesta Friderici

1, 2, 3

William of Poitiers, Gesta Guilelmi

1, 2

Raoul, Gesta Tancredi

1, 2, 3

Suger, De rebus in administratione sua gestis

1

Suger, Vita Ludovici VI Grossi

1, 2

Thietmar, Chronicon

1, 2, 5

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kabala, J. Computational authorship attribution in medieval Latin corpora: the case of the Monk of Lido (ca. 1101–08) and Gallus Anonymous (ca. 1113–17). Lang Resources & Evaluation 54, 25–56 (2020). https://doi.org/10.1007/s10579-018-9424-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-018-9424-0

Keywords

Navigation