corpora 740
​I​d​e​a​s​ ​I​l​l​u​s​t​r​a​t​e​d​ ​»​ ​B​l​o​g​ ​A​r​c​h​i​v​e​ ​»​ ​V​i​s​u​a​l​i​z​i​n​g​ ​E​n​g​l​i​s​h​ ​W​o​r​d​ ​O​r​iï
visualization nlp corpora ling
yesterday by mt3_666
visualization nlp corpora ling
yesterday by mt3_666
Where are the Novels?
20 days ago by jschneider
"The lion’s share of the private/for-profit scans are from the Corvey Collection which the publisher Gale appears to control. ""Based on the sample, we may guess that about 58%—somewhere between 47% and 68%—of the 2,903 novels have publicly accessible scans.1 For any given novel, however, the chance of finding a scan seems to depend on two things: (1) the novel’s year of publication and (2) the novel having subsequent editions or printings (see Figure 2).""Two results stand out. First, the 19th century British novel is a phenomenally well-preserved part of cultural history. Copies of nearly ever novel published during the period survive. Second, the proportion of novels scanned between 1800 and 1820 is low—likely around 33% based on this sample. This raises concerns about any claim of representativeness made on behalf of existing corpora covering those years. As libraries and private collections continue to be digitized and, I hope, be made publicly accessible, such concerns should diminish."
digitization
openaccess
publicdomain
corpora
20 days ago by jschneider
The Mechanic Muse - The Jargon of the Novel, Computed - NYTimes.com
20 days ago by jschneider
"Corpus of Contemporary American English, or COCA, which brings together 425 million words of text from the past two decades, with equally large samples drawn from fiction, popular magazines, newspapers, academic texts and transcripts of spoken English. "
nytimes
corpora
fiction
David
Bamman
computational-linguistics
20 days ago by jschneider
Full Text Search Engines, Part I ​M​a​t​h​i​a​s​ ​H​a​s​s​e​l​m​a​n​n​ ​-​ ​T​a​s​c​h​e​n​o​r​a​k​e​l​.​d​e
sqlite evaluation search lucene solr mysql database c++ corpora xapian optimization webserver imdb
5 weeks ago by mt3_666
sqlite evaluation search lucene solr mysql database c++ corpora xapian optimization webserver imdb
5 weeks ago by mt3_666
​R​a​v​e​n​D​B​ ​&​ ​F​r​e​e​D​B​:​ ​A​n​ ​o​p​t​i​m​i​z​a​t​i​o​n​ ​o​p​p​o​r​t​u​n​i​t​y​ ​-​ ​A​y​e​n​d​e​ ​@​ ​R​a​h​i​e​n
wikipedia corpora ravendb database optimization
5 weeks ago by mt3_666
wikipedia corpora ravendb database optimization
5 weeks ago by mt3_666
​o​p​t​i​m​i​z​a​t​i​o​n​ ​-​ ​H​o​w​ ​d​o​ ​y​o​u​ ​c​o​u​n​t​ ​c​a​r​d​i​n​a​l​i​t​y​ ​o​f​ ​v​e​r​y​ ​l​a​r​g​e​ ​d​a​t​a​s​
python optimization datamining corpora
5 weeks ago by mt3_666
python optimization datamining corpora
5 weeks ago by mt3_666
Copy this bookmark: