NEH banner [Return to Query]

Products for grant HAA-271822-20

Computational tools for diachronic and cross-cultural study of literature: multilingual stylometry and phylogenetic profiling
Pramit Chaudhuri, University of Texas, Austin

Grant details:

Profiling of Intertextuality in Latin Literature Using Word Embeddings (Article)
Title: Profiling of Intertextuality in Latin Literature Using Word Embeddings
Author: Patrick Burns
Author: James Brofos
Author: Kyle Li
Author: Pramit Chaudhuri
Author: Joseph Dexter
Abstract: Identifying intertextual relationships between authors is of central importance to the study of literature. We report an empirical analysis of intertextuality in classical Latin literature using word embedding models. To enable quantitative evaluation of intertextual search methods, we curate a new dataset of 945 known parallels drawn from traditional scholarship on Latin epic poetry. We train an optimized word2vec model on a large corpus of lemmatized Latin, which achieves state-of-the-art performance for synonym detection and outperforms a widely used lexical method for intertextual search. We then demonstrate that training embeddings on very small corpora can capture salient aspects of literary style and apply this approach to replicate a previous intertextual study of the Roman historian Livy, which relied on hand-crafted stylometric features. Our results advance the development of core computational resources for a major premodern language and highlight a productive avenue for cross-disciplinary collaboration between the study of literature and NLP.
Year: 2021
Primary URL:
Access Model: Open Access
Format: Journal
Periodical Title: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Publisher: Association for Computational Linguistics

Semantic Intertextual Search with Latin Word-Embedding Models (Public Lecture or Presentation)
Title: Semantic Intertextual Search with Latin Word-Embedding Models
Abstract: This paper describes optimization of a computational method for representing semantic information in Latin texts and application of the method to identifying intertextual relationships of literary significance. The distributional hypothesis in linguistics holds that the meaning of a word can be inferred from the contexts in which it is used (Firth); the development of effective methods for computing distributional representations known as word embeddings has revolutionized natural language processing research over the past decade (Mikolov et al., Devlin et al.). We optimize a word embedding model for Latin and use that model to improve existing methods for intertextual search through incorporation of semantic matching...
Author: Joseph Dexter
Author: Pramit Chaudhuri
Date: 01/10/2021
Location: 152nd Annual Meeting of the Society for Classical Studies
Primary URL: