NEH banner

[light] [dark]

Funded Projects Query Form
One match

Grant number like: HAA-277203-21

Query elapsed time: 0.053 sec

Export results to Excel
Save this query


University of Maryland, College Park (College Park, MD 20742-5141)
Matthew Thomas Miller (Project Director: June 2020 to present)
David Smith (Co Project Director: November 2020 to present)
Automatic Collation for Diversifying Corpora: Improving Handwritten Text Recognition (HTR) for Arabic-script Manuscripts

Refinement of machine learning methods to improve automatic handwritten text recognition of Persian and Arabic manuscripts and make these sources more accessible for humanities research and teaching.

The Automatic Collation for Diversifying Corpora (ACDC) project will significantly improve the accuracy of handwritten text recognition (HTR) for Arabic-script manuscripts by developing a collation tool to automatically create large amounts of training data from existing digital texts and manuscript images without time-consuming human annotation of individual manuscripts. The ACDC project will accomplish this task by extending the capabilities of the text alignment tool passim and the HTR engine Kraken to align very poor initial HTR transcriptions of diverse manuscript exemplars with existing digital texts in order to automatically produce training data in a “distantly supervised” manner. The ACDC tool’s acceleration of the training data production process will enable, for the first time, the creation of generalizable Arabic and Persian HTR models required for the digital transcription of large-scale Persian and Arabic manuscript collections.

Project fields:
Arabic Language; Interdisciplinary Studies, Other; Languages, Other

Digital Humanities Advancement Grants

Digital Humanities

$324,571 (approved)
$282,905 (awarded)

Grant period:
1/1/2021 – 6/30/2022