NEH banner

Funded Projects Query Form
One match

Grant number: HD-50099-07

Save this query
Create a new query
Export results to Excel

Drexel University (Philadelphia, PA 19104-2875)
Robert B. Allen
HD-50099-07
Automatic Extraction of Article Metadata from Digitized Historical Newspapers

The development of a programming tool for automatically identifying, categorizing, and describing newspaper articles from digital files produced by the National Digital Newspaper Program (NDNP).

In the next few years, images of several hundred thousand pages will be digitized and available online through the National Digital Newspaper Program. While the digitization process typically includes identification of the words in the text using basic optical character recognition (OCR), the identification and indexing of articles is not required of the project awardees. Articles are the natural unit for interacting with the news. Knowing the articles can improve search accuracy and support user-friendly interaction and it should increase the value of the material for historians, teachers of history, and members of the public who are interested in history. We will develop automated methods for such article-level processing. Specifically we will build a set of Java programs that will use the image files and the OCR files as input and will identify, categorize, and extract descriptions from articles.

Project fields: Library Science
Program: Digital Humanities Start-Up Grants
Division: Digital Humanities
Total amount awarded: $30,000
Grant period: 4/1/2007 – 4/30/2009

Create a new query