Products for grant HAA-261271-18

A Linked Digital Environment for Coptic Studies
Amir Zeldes, Georgetown University

Building Linguistically and Intertextually-Tagged Coptic Corpora with Open Source Tools (Conference Paper/Presentation)
Title: Building Linguistically and Intertextually-Tagged Coptic Corpora with Open Source Tools
Author: Miyagawa, So, Zeldes, Amir, Büchler, Marco, Behlmer, Heike and Griffitts, Troy
Abstract: Coptic is the last stage of the Egyptian language. Before Coptic, Ancient Egyptian was written in Hieroglyphs, Hieratic, and Demotic scripts. Starting in the third century CE (excluding “Old Coptic”), Coptic used an alphabet based on the Greek and several added Demotic letters. A large but understudied corpus of literary texts exists in Coptic, including important Gnostic, monastic and Manichaean texts, as well as early Biblical translations. Efforts to build a digital Coptic corpus are still in their initial phases. In this paper, we present the most recent work in a partnership of Digital Humanities projects. Coptic SCRIPTORIUM (Schroeder and Zeldes, 2016) is a major initiative endeavoring to put corpora online which are linguistically and philologically annotated (i.e. supporting grammatical, paleographical and literary annotations), while projects in Göttingen are producing digital editions of Coptic texts focusing on philological standards and critical editions: A project at the Göttingen Academy of Sciences and Humanities is preparing a complete digital edition of the Coptic Old Testament (Behlmer and Feder, 2017), and in a project of Collaborative Research Centre 1136 “Education and Religion” digital diplomatic editions of selected works of Shenoute and Besa, 4th-5th century abbots of the White Monastery in Upper Egypt, are being prepared for text reuse research. Based on our experiences, we have schematized workflows for building Coptic corpora with linguistic and literary information by using open source programs, merging data from OCR (Optical Character Recognition) and transcription sources, Natural Language Processing (NLP) tools, and manual annotation interfaces allowing for the correction of automatic tool output.
Date: 9/11/2018
Conference Name: Proceedings of JADH2018

Understanding Space and Place through Digital Text Analysis (Conference Paper/Presentation)
Title: Understanding Space and Place through Digital Text Analysis
Author: Schroeder, Caroline T.
Date: 2/25/2019
Conference Name: Third PAThs International Conference: Coptic Literature in Context. The Contexts of Coptic Literature: Late Antique Egypt in a dialogue between literature, archaeology and digital humanities. Sapienza University, Rome

A Characterwise Windowed Approach to Hebrew Morphological Segmentation (Conference Paper/Presentation)
Title: A Characterwise Windowed Approach to Hebrew Morphological Segmentation
Author: Zeldes, Amir
Abstract: This paper presents a novel approach to the segmentation of orthographic word forms in contemporary Hebrew, focusing purely on splitting without carrying out morphological analysis or disambiguation. Casting the analysis task as character-wise binary classification and using adjacent character and wordbased lexicon-lookup features, this approach achieves over 98% accuracy on the benchmark SPMRL shared task data for Hebrew, and 97% accuracy on a new out of domain Wikipedia dataset, an improvement of ˜4% and 5% over previous state of the art performance.
Date: 10/31/2018
Conference Name: 15th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology at EMNLP 2018, Brussels, Belgium

The Coptic Universal Dependency Treebank (Conference Paper/Presentation)
Title: The Coptic Universal Dependency Treebank
Author: Zeldes, Amir and Abrams, Mitchell
Abstract: This paper presents the Coptic Universal Dependency Treebank, the first dependency treebank within the Egyptian subfamily of the Afro-Asiatic languages. We discuss the composition of the corpus, challenges in adapting the UD annotation scheme to existing conventions for annotating Coptic, and evaluate inter-annotator agreement on UD annotation for the language. Some specific constructions are taken as a starting point for discussing several more general UD annotation guidelines, in particular for appositions, ambiguous passivization, incorporation and object-doubling.
Date: 11/1/2018
Conference Name: Proceedings of the Universal Dependencies Workshop 2018