corpora   740

« earlier    

Where are the Novels?
"The lion’s share of the private/for-profit scans are from the Corvey Collection which the publisher Gale appears to control. ""Based on the sample, we may guess that about 58%—somewhere between 47% and 68%—of the 2,903 novels have publicly accessible scans.1 For any given novel, however, the chance of finding a scan seems to depend on two things: (1) the novel’s year of publication and (2) the novel having subsequent editions or printings (see Figure 2).""Two results stand out. First, the 19th century British novel is a phenomenally well-preserved part of cultural history. Copies of nearly ever novel published during the period survive. Second, the proportion of novels scanned between 1800 and 1820 is low—likely around 33% based on this sample. This raises concerns about any claim of representativeness made on behalf of existing corpora covering those years. As libraries and private collections continue to be digitized and, I hope, be made publicly accessible, such concerns should diminish."
digitization  openaccess  publicdomain  corpora 
20 days ago by jschneider
The Mechanic Muse - The Jargon of the Novel, Computed - NYTimes.com
"Corpus of Contemporary American English, or COCA, which brings together 425 million words of text from the past two decades, with equally large samples drawn from fiction, popular magazines, newspapers, academic texts and transcripts of spoken English. "
nytimes  corpora  fiction  David  Bamman  computational-linguistics 
20 days ago by jschneider

« earlier    

Copy this bookmark:



description:


tags: