Developing the Data Set of Nineteenth-Century Knowledge
Peter Logan, Temple University

Grant details:

Nineteenth-Century Knowledge Project (Conference Paper/Presentation)
Author: Peter M. Logan
Abstract: This talk outlines the progress and problems of a three-year-old project to build an extensive, open, digital collection for studying the structure of nineteenth-century knowledge, based on historic editions of the Encyclopedia Britannica. Today, we readily recognize a pervasive Eurocentrism in these entries, among other flaws. But at the time, the Britannica editions were the most authoritative comprehensive representation in the English-speaking world of knowledge as a whole. Knowledge has changed since that time, and it changed during the publication of this material, from 1790-1911. This data set documents those changes. The goal of this project is identify patterns in the transformation of knowledge by mining the final data set. All of these works are available on the web, but their textual data is too inaccurate for valid text mining. This project thus creates the first accurate TEI edition of this valuable resource. The full corpus consists of 100,000 articles derived from 80,000 print pages. The TEI will be supplemented with metadata using Named Entity Recognition. The Metadata Research Center at Drexel University will further enrich the data by adding subject metadata from both current and historical vocabularies, using an automated recognition program they developed. When complete, all individual entries will be made freely available through the Oxford Text Archive. It will be uploaded for other researchers in bulk form to the CORE Repository of the Humanities Commons.
Date: 9/10/18
Conference Name: TEI2018 (Text Encoding Initiative Consortium)