Abstract
This paper applies computational methods of authorship attribution to shed light on a still open question concerning two Latin works of the twelfth century: are the anonymous authors of the Translatio s. Nicolai (ca. 1101–1108) and the Gesta principum polonorum (ca. 1113–1117) one and the same person? The Translatio was written by the so-called Monk of Lido and describes Venice’s role in the First Crusade. The Gesta were written by the so-called Gallus Anonymous and contain a panegyric of the contemporary Polish ruler, Bolesław III the Wry-Mouthed (r. 1102–1138). This study attributes authorship to these works within four corpora of Latin texts composed between the tenth and twelfth centuries, each with between 39 and 116 texts written by between 15 and 22 different authors. The goal of including four corpora is to see how robust the similarity between the target texts is to changes in text length, genre, and class balance in the corpora. In each corpus, nine different distance metrics and one machine-learning algorithm are used to classify the authors of the Translatio and Gesta. I conclude that it is highly likely that Gallus and Monk were indeed one and same anonymous author, and highlight the effectiveness of the Bray–Curtis distance and logistic regression as methods of attribution.
Similar content being viewed by others
Notes
Monk of Lido (1895), henceforth referred to as Translatio. The edition of an abridged twelfth-century copy of the Translatio, discovered well after this critical edition was published, may be consulted in Riedmann (1986), pp. 359–364. A German translation of the eleven miracle stories concluding the Translatio can be consulted in Seeger (2005), pp. 254–287.
The term was popularized by Charles Homer Haskins in his The Renaissance of the Twelfth Century (1927).
For a study of these historical implications see Kabala (forthcoming).
According to Kromer, “A Frenchman [Gallus] wrote this history—some monk, I imagine, who lived during the time of Bolesław III, as can be gathered from the prologue.” See Knoll, Schaer and Polak, “Editor’s Introduction,” xxv, in Gallus Anonymous (2003), for a photographic reproduction of Kromer’s note: “Gallus hanc historiam scripsit, monachus, opinor, aliquis, ut ex proemiis coniicere licet qui Boleslai tertii tempore vixit.”
For an overview of theories, see Mühle (2009), pp. 464–475 and Bagi (2008), pp. 15–23. Bagi organized the work on Gallus’s identity into three broad approaches emphasizing a Provençal-Hungarian career, northern French/Walloon origins, and Italian origins. To these can be added a fourth and more recent group of scholars, who have sought Gallus’s origins in Germany (Fried 2009; Wenta 2011).
Jasiński designed programs to detect high-frequency occurrences of the kinds of rare cursus cadences preferred by Gallus in consecutive chunks of Latin sources drawn from his personal database of medieval Latin texts.
Eder used two reference corpora of Latin texts—one small (9 texts), the other large (159 texts).
Jasiński has not published or documented his personal database of Latin texts, which he describes as containing 900,000 pages of Latin (Jasiński 2014, p. 256), or “the majority of all of Latin writing to the end of the twelfth century” (Jasiński 2016, p. 157; my translation). While he does describe the functionality of the computer programs used to search this database, these too remain unavailable for verification. Similarly, Eder described his larger reference corpus without listing all its members or identifying the metrics used to determine similarities between them (Eder 2015). Elsewhere, he provides a general description of the algorithms used as well as a link to his software package (Eder 2017).
Part of Fried’s traditional philological critique of Jasiński’s (2008) book was: “Eine Gegenprobe, wieweit nämlich die angeführten Parallelen auch bei anderen Autoren anzutreffen sind, wäre geboten.”
The one text treated with OCR software is the Monk of Lido’s Translatio.
e.g. in Geoffrey of Malaterra’s De rebus gestis Rogerii et Roberti Guiscardi, Liutprand of Cremona’s Antapodosis, and Suger’s Vita Ludovici. I have not removed chapter or book titles or numbers in the text itself, which introduces some noise into the data: as certain editions list chapter titles within the text while others give only chapter numbers, the count of ‘de’ (‘about’), a frequent word in chapter titles, will be higher in the former than in the latter.
e.g. in Otloh of St. Emmeram’s Liber de temptationibus, Abelard’s Dialogus, and Anselm’s Monologion. Additional traces of editorial noise remain, as certain works contain brief editorial notes before every section of text (e.g. all six works of Guibert of Nogent).
e.g. ‘Ans.’ and ‘Bos.’ in Anselm’s Cur Deus Homo; ‘Homo’ and ‘Anima’ in Hugh of St. Victor’s Soliloquium; ‘Philosophus,’ ‘Christian.,’ ‘Judaeus,’ and ‘Judex’ in Abelard’s Dialogus inter philosophum, Judaeum et christianum; and ‘Judaeus’ and ‘Christianus’ in Rupert of Deutz’s Annulus sive dialogus inter Christianum et Judaeum.
e.g. chapters 19–22 of Otloh of St. Emmeram’s Liber visionum, which were written by Boniface and Bede centuries earlier. I have retained only Books 1–2 of Otto of Freising’s Gesta Friderici imperatoris, i.e. only the portion written by Otto, and not by Rahewin.
e.g. throughout the three books of Gallus’s Gesta or the opening to John of Salisbury’s Polycraticus.
A previous computational study of the Gesta and Translatio relied on a small set of 9 Latin texts composed close in time to Gallus and Monk’s works, and a larger set of 159 Latin texts spanning 16 centuries from Antiquity to the Reformation (Eder 2015).
Experiments with various segment length targets above and below the length of the Translatio showed that a target length of 10,500 words yielded the least variance in segment length.
See Cha (2007) and Wolfram Mathematica documentation (http://reference.wolfram.com/language/) for definitions of the first eight distances. See Koppel and Winter (2014) for the definition of the min–max distance.
All files are available at www.jakubkabala.com/gallus-monk/. As many of the experiments in this study rely on random subsamplings of words in the corpora, subsequent runs of this code will produce results slightly different from the ones reported here.
As expected, the min–max and Bray–Curtis metrics placed the first half of the Gesta closest to the second half of the Gesta in 10 out of 10 subsample-based VSMs of corpus D, and in at least 9 out of the 10 subsample-based VSMs of both corpora B and C (Fig. 5); the results are even more consistent for Gesta2 (Fig. 6).
NB: Wolfram’s implementation of logistic regression reports second-most likely attributions only when the probability of such an attribution is sufficiently high. See Wolfram documentation for details.
Wincenty Lutosławski formulated his “law of stylistic affinity” in 1897. He set out to demonstrate the order in which Plato wrote his dialogues through a meticulous counting and study of the frequencies of the Greek particles in each one. His “law” anticipated the results of modern psychological research cited earlier in this article: “Of two works of the same author and of the same size, that is nearer in time to a third, which shares with it the greater number of stylistic peculiarities, provided that their different importance is taken into account, and that the number of observed peculiarities is sufficient to determine the stylistic character of all the three works… If we now ask how the law of stylistic affinity can be verified, the first and nearest answer lies in the psychological property of style as a mark of identity, entirely depending on the totality of familiar expressions at any time in the writer’s consciousness. Every writer could find easily in his own experience sufficient evidence in favour of this psychological law” (emphasis in original). See Lutosławski (1897), pp. 152–153.
References
Abbasi, A., & Chen, H. (2005). Applying authorship analysis to extremist-group web forum messages. IEEE Intelligent Systems,20, 67–75.
Adamska, A. (2000). ‘From memory to written record’ in the periphery of medieval Latinitas: The case of Poland in the eleventh and twelfth centuries. In K. J. Heidecker (Ed.), Charters and the use of the written word in medieval society (pp. 83–100). Turnhout: Brepols.
Angold, M., & Balard, M. (2007). Venice: A Bibliography. In M. Whitby (Ed.), Byzantines and crusaders in non-Greek sources, 1025–1204 (pp. 86–94). Oxford: Oxford University Press.
Argamon, S. (2007). Interpreting Burrows’s delta: Geometric and probabilistic foundations. Literary and Linguistic Computing,23, 131–147.
Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM,52, 119–123.
Argamon, S., & Levitan, S. (2005). Measuring the usefulness of function words for authorship attribution. In Proceedings of the 2005 ACH/ALLC conference. Victoria, BC, Canada, June 2005.
Baayen, H., Van Halteren, H., & Tweedie, F. (1996). Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing,11, 121–132.
Bagi, D. (2008). Królowie węgierscy w Kronice Galla Anonima. Cracow: Polska Akademia Umiejętności.
Benedetto, D., Degli Esposti, M., & Maspero, G. (2013). The puzzle of Basil’s Epistula 38: A mathematical approach to a philological problem. Journal of Quantitative Linguistics,20, 267–287.
Binongo, J. N. G. (2003). Who wrote the 15th book of Oz? An application of multivariate analysis to authorship attribution. Chance,16, 9–17.
Binongo, J. N. G., & Smith, M. W. (1999). The application of principal component analysis to stylometry. Literary and Linguistic Computing,14, 445–466.
Borawska, D. (1965). Gallus Anonim czy Italus Anonim. Przegląd Historyczny,56(1), 111–119.
Boyle, L. E. (1992). Diplomatics. In J. M. Powell (Ed.), Medieval Studies: An Introduction (pp. 82–113). Syracuse, N.Y.: Syracuse University Press.
Burrows, J. F. (1987). Word-patterns and story-shapes: The statistical analysis of narrative style. Literary and Linguistic Computing,2, 61–70.
Burrows, J. F. (1989). “An ocean where each kind…”: Statistical analysis and some major determinants of literary style. Computers and the Humanities,23, 309–321.
Burrows, J. F. (1992a). Computers and the study of literature. In C. Butler (Ed.), Computers and written texts: An applied perspective (pp. 167–204). Oxford: Blackwell.
Burrows, J. F. (1992b). Not unless you ask nicely: The interpretative nexus between analysis and information. Literary and Linguistic Computing,7, 91–109.
Burrows, J. F. (2002). “Delta”: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing,17, 267–287.
Burrows, J. F., & Hassall, A. J. (1988). Anna Boleyn and the authenticity of Fielding’s feminine narratives. Eighteenth-Century Studies,21, 427–453.
Cha, S.-H. (2007). Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences,1(4), 300–307.
Coleman, C. B. (1922). The treatise of Lorenzo Valla on the Donation of Constantine: Text and translation into English. New Haven: Yale University Press.
Diederich, J., Kindermann, J., Leopold, E., & Paass, G. (2003). Authorship attribution with support vector machines. Applied Intelligence,19, 109–123.
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the seventh international conference on information and knowledge management (pp. 148–155). Bethesda, MD, November 1998.
Eder, M. (2013). Does size matter? Authorship attribution, small samples, big problem. Digital Scholarship in the Humanities,30, 167–182.
Eder, M. (2015). In search of the author of Chronica Polonorum ascribed to Gallus Anonymus: A stylometric reconnaissance. Acta Poloniae Historica,112, 5–23.
Eder, M. (2017). Visualization in stylometry: Cluster analysis using networks. Digital Scholarship in the Humanities, 37(1), 50–64.
Evert, S., Proisl, T., Jannidis, F., Reger, I., Pielström, S., Schöch, C., et al. (2017). Understanding and explaining delta measures for authorship attribution. Digital Scholarship in the Humanities, 32(Supplement 2), 4–16.
Fried, J. (2009). Kam der Gallus Anonymus aus Bamberg? Archiv für Erforschung des Mittelalters,65, 497–546.
Gallus Anonymous. (1952). Galli Anonymi cronicae et gesta ducum sive principum polonorum: Anonima tzw. Galla kronika czyli dzieje książąt i władców polskich. Edited by K. Maleczyński. Cracow: Polska Akademia Umiejętności.
Gallus Anonymous. (2003). Gesta principum Polonorum: The deeds of the princes of the Poles. Translated by P.W. Knoll & F. Schaer. Budapest: Central European University Press.
Genkin, A., Lewis, D. D., & Madigan, D. (2006). Large-scale Bayesian logistic regression for text categorization. Technometrics,49(3), 291–304.
Grieve, J. (2007). Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing,22, 251–270.
Haskins, C. H. (1927). The renaissance of the twelfth century. Cambridge, MA: Harvard University Press.
Holmes, D. I. (1994). Authorship attribution. Computers and the Humanities,28, 87–106.
Holmes, D. I. (1998). The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing,13, 111–117.
Holmes, D. I., & Forsyth, R. S. (1995). The Federalist revisited: New directions in authorship attribution. Literary and Linguistic Computing,10, 111–127.
Holmes, D. I., Gordon, L. J., & Wilson, C. (2001a). A widow and her soldier: Stylometry and the American civil war. Literary and Linguistic Computing,16, 403–420.
Holmes, D. I., Robertson, M., & Paez, R. (2001b). Stephen Crane and the ‘New-York Tribune’: A case study in traditional and non-traditional authorship attribution. Computers and the Humanities,35, 315–331.
Hoover, D. (2004). Testing Burrows’s delta. Literary and Linguistic Computing,19, 453–475.
Hoover, D. (2006). Stylometry, chronology and the styles of Henry James. In Proceedings of digital humanities 2006 (pp. 78–80). Paris, July 2006.
Houvardas, J., & Stamatatos, E. (2006). N-gram feature selection for authorship identification. In Proceedings of the 12th international conference on artificial intelligence: Methodology, systems, and applications (pp. 77–86). Varna, Bulgaria, September 2006.
Janson, T. (1975). Prose rhythm in medieval Latin from the 9th to the 13th century. Stockholm: Almqvist & Wiksell International.
Jasiński, T. (2008). O pochodzeniu Galla Anonima. Cracow: Avalon.
Jasiński, T. (2011). Kronika Polska Galla Anonima w świetle unikatowej analizy komputerowej nowej generacji. Poznań: Instytut Historii UAM.
Jasiński, T. (2014). Informatyka w służbie filologii łacińskiej i historii. Rocznik Biblioteki Narodowej,45, 243–265.
Jasiński, T. (2016). Gall Anonim—poeta i mistrz prozy. Cracow: Avalon.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European conference on machine learning (pp. 137–142). Chemnitz, Germany, April 1998.
Jockers, M. L., & Witten, D. M. (2010). A comparative study of machine learning methods for authorship attribution. Literary and Linguistic Computing,25, 215–223.
John, J. J. (1992). Latin palaeography. In J. M. Powell (Ed.), Medieval Studies: An Introduction (pp. 3–81). Syracuse, N.Y.: Syracuse University Press.
Juola, P. (2007). Becoming Jack London. Journal of Quantitative Linguistics,14, 145–147.
Juola, P. (2008). Authorship attribution. Foundations and Trends in Information Retrieval,1(3), 233–334.
Juola, P. (2015). The Rowling case: A proposed standard analytic protocol for authorship questions. Digital Scholarship in the Humanities,30(Supplement 1), 100–113.
Kabala, J. (forthcoming). Gallus Anonymous vel Monk of Lido (fl. ca. 1101–1117): Writer for hire in twelfth-century Europe. The Medieval Globe, 5.
Kešelj, V., Peng, F., Cercone, N., & Thomas, C. (2003). N-gram-based author profiles for authorship attribution. In Proceedings of the conference of the Pacific association for computational linguistics 2003 (pp. 255–264). Halifax, Canada, August 2003.
Kestemont, M., Luyckx, K., Daelemans, W., & Crombez, T. (2012). Cross-genre authorship verification using unmasking. English Studies,93, 340–356.
Kestemont, M., Moens, S., & Deploige, J. (2015). Collaborative authorship in the twelfth century: A stylometric study of Hildegard of Bingen and Guibert of Gembloux. Digital Scholarship in the Humanities,30, 199–224.
Kestemont, M., Stover, J., Koppel, M., Karsdorp, F., & Daelemans, W. (2016). Authenticating the writings of Julius Caesar. Expert Systems with Applications,63, 86–96.
Knowles, D. (1964). Great historical enterprises: Problems in monastic history. London: Thomas Nelson and Sons.
Koppel, M., & Schler, J. (2004). Authorship verification as a one-class classification problem. In Proceedings of the twenty-first international conference on machine learning. Banff, Canada, 2004.
Koppel, M., Schler, J., & Argamon, S. (2009). Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology,60, 9–26.
Koppel, M., Schler, J., & Argamon, S. (2011). Authorship attribution in the wild. Language Resources and Evaluation,45, 83–94.
Koppel, M., Schler, J., & Bonchek-Dokow, E. (2007). Measuring differentiability: Unmasking pseudonymous authors. Journal of Machine Learning Research,8, 1261–1276.
Koppel, M., & Winter, Y. (2014). Determining if two documents are written by the same author. Journal of the Association for Information Science and Technology,65, 178–187.
Labuda, G. (2006). Zamiana Galla-Anonima, autora pierwszej Kroniki dziejów Polski, na Anonima-Wenecjanina. Studia Źródłoznawcze,44, 117–125.
Love, H. (2002). Attributing authorship: An introduction. Cambridge: Cambridge University Press.
Lutosławski, W. (1897). The origin and growth of Plato’s logic: With an account of Plato’s style and of the chronology of his writings. London: Longmans, Green & Co.
Luyckx, K., & Daelemans, W. (2011). The effect of author set size and data size in authorship attribution. Literary and Linguistic Computing,26, 35–55.
Madigan, D., Genkin, A., Lewis, D. D., & Fradkin, D. (2005). Bayesian multinomial logistic regression for author identification. In Proceedings of the American institute of physics conference 2005 (pp. 509–516). San Jose, August 2005.
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.
Mendenhall, T. C. (1887). The characteristic curves of composition. Science,9, 237–249.
Monk of Lido. (1895). Monachi anonymi littorensis Historia de translatione sanctorum Magni Nicolai, terra marique miraculis gloriosi, eiusdem avunculi alterius Nicolai, Theodorique martyris pretiosi, de civitate Mirea in monasterium S. Nicolai de littore Venetiarum. In Recueil des historiens des croisades: historiens occidentaux (Vol. 5, pp. 253–292). Paris: Imprimerie Royale.
Mosteller, F., & Wallace, D. L. (1964). Inference and disputed authorship: The Federalist. Reading, MA: Addison-Wesley.
Mühle, E. (2009). ‘Cronicae et gesta ducum sive principum Polonorum’: Neue Forschungen zum so genannten Gallus Anonymus. Archiv für Erforschung des Mittelalters,65, 459–496.
Nicol, D. M. (1988). Byzantium and Venice: A study in diplomatic and cultural relations. Cambridge: Cambridge University Press.
Norberg, D. L. (1980). Manuel pratique de latin médiéval. Paris: A. et J. Picard.
Pennebaker, J. W. (2011). The secret life of pronouns: What our words say about us. New York: Bloomsbury.
Pennebaker, J. W., & Stone, L. D. (2003). Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology,85, 291–301.
Plezia, M. (1984). Nowe studia nad Gallem-Anonimem. In H. Chłopocka & B. Kürbis (Eds.), Mente et litteris: o kulturze i społeczeństwie wieków średnich (pp. 111–120). Poznań: Uniwersytet im. Adama Mickiewicza.
Pranckevičius, T., & Marcinkevičius, V. (2016). Application of logistic regression with part-of-the-speech tagging for multi-class text classification. In Proceedings of the IEEE 4th workshop on advances in information, electronic and electrical engineering (pp. 1–5). Vilnius, Lithuania, November 2016.
Riedmann, J. (1986). Eine Überlieferung der ‘Translatio sancti Nicolai’ aus dem 12. Jahrhundert im Tiroler Landesarchiv Innsbruck. In K. Ebert (Ed.), Festschrift Nikolaus Grass: Zum 70. Geburtstag dargebracht von Fachkollegen und Freunden (pp. 349–364). Innsbruck: Universitätsverlag Wagner.
Rudman, J. (1998). The state of authorship attribution studies: Some problems and solutions. Computers and the Humanities,31, 351–365.
Sanderson, C., & Guenter, S. (2006). Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation. In Proceedings of the conference on empirical methods in natural language processing (pp. 482–491). Sydney, Australia, July 2006.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys,34, 1–47.
Seeger, S. (2005). Der heilige Nikolaus von Myra und die Historia de translatione sanctorum magni Nicolai, alterius Nicolai Theodorique martyris (nach 1116). In K. Herbers, L. Jiroušková, & B. Vogel (Eds.), Mirakelberichte des frühen und hohen Mittelalters (pp. 254–287). Darmstadt: Wissenschaftliche Buchgesellschaft.
Stamatatos, E. (2008). Author identification: Using text sampling to handle the class imbalance problem. Information Processing and Management, 44(2), 790–799.
Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology,60, 538–556.
Stover, J. A., Winter, Y., Koppel, M., & Kestemont, M. (2016). Computational authorship verification method attributes a new work to a major 2nd century African author. Journal of the Association for Information Science and Technology,67, 239–242.
Wenta, J. (2011). Kronika tzw. Galla Anonima: Historyczne (monastyczne i genealogiczne) oraz geograficzne konteksty powstania. Toruń: Uniwersytet Mikołaja Kopernika.
Wieczorek, S. (2010). ‘Omnibus omnia factus sum’. Na marginesie książki Tomasza Jasińskiego O pochodzeniu Galla Anonima. Kwartalnik Historyczny,117, 87–106.
Zhang, T., & Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval,4(1), 5–31.
Zhang, J., & Yang, Y. (2003). Robustness of regularized linear classification methods in text categorization. In Proceedings of the 26th ACM SIGIR conference on research and development in information retrieval (pp. 190–197). Toronto, July–August 2003.
Zhao, Y., & Zobel, J. (2005). Effective and scalable authorship attribution using function words. In Proceedings of the 2nd Asia information retrieval symposium (pp. 174–189). Jeju Island, Korea, October 2005.
Zheng, R., Li, J., Chen, H., & Huang, Z. (2006). A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology,57(3), 378–393.
Acknowledgements
I would like to thank Michael McCormick, in whose Graduate Research Seminar on medieval computational philology this article was conceived, and Stuart Shieber, for training and feedback in the Initiative for the Science of the Human Past at Harvard. I also thank Zbigniew Kabala for introducing me to Mathematica, sharing literature and offering feedback, as well as Daniel Lichtblau and others at Wolfram Research for their questions and suggestions on an earlier version of this paper. Finally, I would like to acknowledge the two anonymous reviewers for their insightful comments and suggestions, which have significantly improved this study.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Composition of corpora A, B, and C
The following list represents the composition of corpus A. As explained in the main article, all works in this list provide the chunked text segments for corpus B. Starred items (*) in the below list represent the subset of historiographical works providing text segments for the genre-specific corpus C. Please refer to Sect. 5.1 of the main article for further details about text preparation.
Parenthetical dates in the list refer to the lifetime of the author, when known; otherwise, they refer to the date of composition of the work. Dates are drawn from Wolfgang Buchwald et al., Dictionnaire des auteurs grecs et latins de l’antiquité et du moyen age (Turnhout: Brepols, 1991) and Robert Auty et al., eds. Lexikon des Mittelalters, 9 vols. (Munich: Artemis, 1977–1999). Titles mostly follow those given by Buchwald et al. The following abbreviations are used:
- LL:
The Latin Library (www.thelatinlibrary.com)
- PL:
Patrologiae cursus completus, series Latina. Ed. J.P. Migne. 221 vols. Paris, 1844–64
- RHC-Hist. Occ.:
Recueil des historiens des croisades: historiens occidentaux. 5 vols. Paris, 1844-1895
* Peter Abelard (1079–1142), Historia calamitatum (LL) |
————, Dialogus inter philosophum, Judaeum et christianum (PL 178, cols. 1611–1684) |
————, Ethica (PL 178, cols. 633–678) |
Anselm of Canterbury (1033/34–1109), Cur Deus homo (PL 158, cols. 360–432) |
————, Monologion (PL 158, cols. 142–224) |
* Cosmas of Prague (c. 1045–1125), Chronicae Bohemorum libri III (PL 166, cols. 55–243) |
* Anonymous (after 1187), De expugnatione Terrae Sanctae per Saladinum (LL) |
* Galbert of Bruges (1127/28), Passio Caroli, comitis Flandriae (PL 166, cols. 943–1046) |
* Gallus Anonymous (c. 1113–c. 1117), Gesta principum polonorum (PL 160, cols. 839–936) |
* Anonymous (c. 1101), Gesta Francorum (LL) |
Guibert of Nogent (1053–1124), De incarnatione contra Judaeos (PL 156, cols. 489–528) |
————, De laude s. Mariae (PL 156, cols. 537–578) |
————, De pignoribus sanctorum (PL 156, cols. 607–680) |
————, De virginitate (PL 156, cols. 579–608) |
* ————, De vita sua sive monodiarum libri III (PL 156, cols. 837–962) |
* ————, Gesta Dei per Francos (PL 156, cols. 679–838) |
Hugh of St. Victor (c. 1096–1141), Didascalicon (LL) |
————, Soliloquium de arrha animae (LL) |
John of Salisbury (c. 1115–1180), Metalogicus (PL 199, cols. 823–946) |
————, Polycraticus (PL 199, cols. 379–822) |
* Liutprand of Cremona (c. 920–c. 972), Antapodosis (PL 136, cols. 789–898) |
* ————, Relatio de legatione Constantinopolitana (PL 136, cols. 909–938) |
* Geoffrey of Malaterra (by 1101), De rebus gestis Rogerii et Roberti Guiscardi (PL 149, cols. 1099–1210) |
* William of Malmesbury (c. 1080–c. 1142), De gestis pontificum Anglorum libri V (PL 179, cols. 1441–1680) |
* ————, De gestis regum Anglorum libri V (PL 179, cols. 957–1392) |
* ————, Historiae novellae libri III (PL 179, cols. 1391–1440) |
* Monk of Lido (after 1101), Translatio s. Nicolai (RHC-Hist. Occ. V, pp. 253–292) |
* Otloh of St. Emmeram (c. 1010–c. 1070), Liber de temptationibus (PL 146, cols. 29–58) |
* ————, Liber visionum (PL 146, cols. 341–388) |
* Otto of Freising (1111/15–1158), Gesta Friderici (LL) |
* William of Poitiers (c. 1020–after 1087), Gesta Guilelmi (PL 149, cols. 1217–1270) |
* Raoul of Caen (after 1107), Gesta Tancredi (LL) |
Rather of Verona (c. 890–974), Praeloquia (PL 136, cols. 145–344) |
* Rupert of Deutz (c. 1070–1129/30), De incendio Tuitiensi (PL 170, cols. 333–358) |
————, Annulus sive dialogus inter Christianum et Judaeum (PL 170, cols. 559–610) |
————, Super quaedam capitula regulae divi Benedicti abbatis (PL 170, cols. 477–538) |
* Suger of St. Denis (c. 1081–1151), De rebus in administratione sua gestis (PL 186, cols. 1211–1240) |
* ————, Vita Ludovici VI Grossi (PL 186, cols. 1253–1340) |
* Thietmar of Merseburg (975–1018), Chronicon (PL 139, cols. 1183–1422) |
Appendix 2: Composition of corpus D
Corpus D is a subset of text segments selected from the text segments in corpus C in order to ensure balance among the author classes. With the exception of the Monk of Lido, only those authors with at least 2 text segments in corpus C were retained. Whenever possible, 3 segments were chosen for each of those authors; when the length of the original work had created only 2 segments, those 2 were chosen. If an author had more than one original work, segments were chosen from at least 2 of those works.
The following table summarizes the composition of corpus D:
Author and text | Segment #s |
---|---|
Cosmas, Chronicae | 1, 2, 3 |
Galbert, Passio Caroli | 1, 2, 3 |
Gallus, Gesta principum polonorum | 1, 2 |
Anonymous, Gesta Francorum | 1, 2 |
Guibert, De vita sua | 3, 4 |
Guibert, Gesta Dei per Francos | 1 |
Liutprand, Antapodosis | 2, 3 |
Liutprand, Relatio de legatione | 1 |
Geoffrey, De rebus gestis | 1, 2, 3 |
William of Malmesbury, De gestis pontificum | 7, 8 |
William of Malmesbury, De gestis regum | 5 |
Monk of Lido, Translatio | 1 |
Otloh, Liber de temptationibus | 1 |
Otloh, Liber visionum | 1 |
Otto, Gesta Friderici | 1, 2, 3 |
William of Poitiers, Gesta Guilelmi | 1, 2 |
Raoul, Gesta Tancredi | 1, 2, 3 |
Suger, De rebus in administratione sua gestis | 1 |
Suger, Vita Ludovici VI Grossi | 1, 2 |
Thietmar, Chronicon | 1, 2, 5 |
Rights and permissions
About this article
Cite this article
Kabala, J. Computational authorship attribution in medieval Latin corpora: the case of the Monk of Lido (ca. 1101–08) and Gallus Anonymous (ca. 1113–17). Lang Resources & Evaluation 54, 25–56 (2020). https://doi.org/10.1007/s10579-018-9424-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-018-9424-0