NEH logo
[Return to Query]

Products for Grant HD-51787-13

HD-51787-13
Understanding Genre in a Collection of a Million Volumes
William Underwood, University of Illinois, Urbana-Champaign

Grant details: https://securegrants.neh.gov/publicquery/main.aspx?f=1&gn=HD-51787-13

"Mapping Mutable Genres in Structurally Complex Volumes" (Article)
Title: "Mapping Mutable Genres in Structurally Complex Volumes"
Author: Underwood, Ted
Author: Black, Michael L.
Author: Auvil, Loretta
Author: Capitanu, Boris
Abstract: To mine large digital libraries in humanistically meaningful ways, we need to divide them by genre. This is a task that classification algorithms are well suited to assist, but they need adjustment to address the specific challenges of this domain. Digital libraries pose two problems of scale not usually found in the article datasets used to test these algorithms. 1) Because libraries span several centuries, the genres being identified may change gradually across the time axis. 2) Because volumes are much longer than articles, they tend to be internally heterogeneous, and the classification task also requires segmentation. We describe a multilayered solution that trains hidden Markov models to segment volumes, and uses ensembles of overlapping classifiers to address historical change. We demonstrate this on a collection of 469,200 volumes drawn from HathiTrust Digital Library.
Year: 2013
Primary URL: http://arxiv.org/abs/1309.3323
Access Model: open access
Format: Journal
Periodical Title: Proceedings of the IEEE

Page-Level Genre Metadata for English-Language Volumes in HathiTrust, 1700-1922 (Database/Archive/Digital Edition)
Title: Page-Level Genre Metadata for English-Language Volumes in HathiTrust, 1700-1922
Author: Underwood, Ted
Abstract: Page-by-page genre predictions for 854,476 English-language volumes printed between 1700 and 1922, keyed to the texts in HathiTrust Digital Library. This research was supported by the National Endowment for the Humanities and the American Council of Learned Societies. The genre predictions were produced by an ensemble of regularized logistic classifiers, and are intended to support research that explores broad trends in literary history. Since volumes usually contain multiple genres, page-level metadata is necessary to create machine-readable collections in a particular genre.
Year: 2014
Primary URL: https://figshare.com/articles/Page_Level_Genre_Metadata_for_English_Language_Volumes_in_HathiTrust_1700_1922/1279201
Primary URL Description: Figshare repository holds many different data files, listing volumes in literary genres, and also characterizing those volumes at a page level.
Access Model: Open access.

A Dataset for Distant-Reading Literature in English. (Blog Post)
Title: A Dataset for Distant-Reading Literature in English.
Author: Underwood, Ted
Abstract: In collaboration with HathiTrust Research Center, the author presents a collection of page-level word counts for English-language volumes in poetry, drama, and fiction.
Date: 03/01/2015
Primary URL: https://tedunderwood.com/2015/08/07/a-dataset-for-distant-reading-literature-in-english-1700-1922/
Website: The Stone and the Shell

Word Frequencies in English-Language Literature (Database/Archive/Digital Edition)
Title: Word Frequencies in English-Language Literature
Author: Underwood, Ted
Author: Capitanu, Boris
Author: Organisciak, Peter
Author: Auvil, Loretta
Author: Bhattacharyya, Sayan
Author: Fallaw, Colleen
Author: Downie, J. Stephen
Abstract: Word frequencies for volumes of English-language literature, based on the metadata generated by Ted Underwood's NEH grant.
Year: 2015
Primary URL: https://analytics.hathitrust.org/genre
Access Model: Open access.

The Longue Durée of Literary Prestige (Article)
Title: The Longue Durée of Literary Prestige
Author: Underwood, Ted
Author: Sellers, Jordan
Abstract: The data used in this article was generated by my NEH grant. A history of literary prestige needs to study both works that achieved distinction, and the mass of volumes from which they were distinguished. To understand how those patterns of preference changed across a century, we gathered two samples of English-language poetry, 1820-1919: one drawn from volumes reviewed in prominent periodicals, and one selected at random from a large digital library (where the majority of authors are relatively obscure). The stylistic differences associated with literary prominence turn out to be quite stable: a statistical model trained to distinguish reviewed from random volumes in any quarter of this century can make predictions almost as accurate about the rest of the period. The “poetic revolutions” described by many histories are not visible in this model — instead we see a steady tendency for new volumes of poetry to change by slightly exaggerating certain features that defined prestige in the recent past.
Year: 2016
Access Model: Subscription plus green open access after publication.
Format: Journal
Publisher: Modern Language Quarterly


Permalink: https://securegrants.neh.gov/publicquery/products.aspx?gn=HD-51787-13