National Archeological Database, Reports module
5 weeks ago by rybesh
The National Archeological Database, Reports module, is an expanded bibliographic inventory of over 350,000 reports on archeological investigation and planning, mostly of limited circulation. This "gray literature" represents a large portion of the primary information available on archeological sites in the U.S.
archaeology
corpus
timeperiods
bibliography
5 weeks ago by rybesh
N-grams: corpus based (COCA, COHA, Spanish, Portuguese)
february 2012 by rybesh
These n-grams are based on the largest publicly-available, genre-balanced corpus of English -- the 425 million word Corpus of Contemporary American English (COCA). With this n-grams data (2, 3, 4, 5-word sequences, with their frequency), you can carry out powerful queries offline -- without needing to access the corpus via the web interface.
english
corpus
linguistics
nlp
ngrams
february 2012 by rybesh
Martha Palmer | Projects | Verb Net
august 2011 by rybesh
VerbNet (VN) (Kipper-Schuler 2006) is the largest on-line verb lexicon currently available for English. It is a hierarchical domain-independent, broad-coverage verb lexicon with mappings to other lexical resources such as WordNet (Miller, 1990; Fellbaum, 1998), Xtag (XTAG Research Group, 2001), and FrameNet (Baker et al., 1998). VerbNet is organized into verb classes extending Levin (1993) classes through refinement and addition of subclasses to achieve syntactic and semantic coherence among members of a class. Each verb class in VN is completely described by thematic roles, selectional restrictions on the arguments, and frames consisting of a syntactic description and semantic predicates with a temporal function, in a manner similar to the event decomposition of Moens and Steedman (1988).
corpus
linguistics
nlp
language
data
frame
semantics
august 2011 by rybesh
ec2 – loc-ndnp
july 2011 by rybesh
To quickly set up your own instance of Chronicling America we've created a public Amazon Machine Image (AMI). This AMI has the Chronicling America software stack already installed, so you can immediately start loading batches. We also have put some real NDNP batches on Elastic Block Storage (EBS) which you can mount and load from.
aws
cloud
newspaper
digitization
corpus
july 2011 by rybesh
Google Books: American English (155 billion words)
may 2011 by rybesh
This interface allows you to search the Google Books data in many ways that are much more advanced than what is possible with the simple Google Books interface. You can search by word, phrase, substring, lemma, part of speech, synonyms, and collocates (nearby words). You can copy the data to other applications for further analysis, which you can't do with the regular Google Books interface. And you can quickly and easily compare the data in two different sections of the corpus (for example, adjectives describing women or art or music in the 1960s-2000s vs the 1870s-1910s).
american
books
corpus
data
statistics
language
may 2011 by rybesh
The Stormont Papers - Home
april 2011 by rybesh
This website offers access to the Parliamentary Debates of the devolved government of Northern Ireland from June 7 1921 to the dissolution of Parliament in March 28 1972.
These papers cast a unique and valuable light on the development of the Province. The 92,000 printed pages of Parliamentary Debates are held by few institutions and they have no comprehensive subject index. Hence they have been inaccessible and difficult to use. This project, with the support of academics, archivists and politicians, has taken the Papers and fully digitised them. The resource has been available online since October 2006.
britain
ireland
history
corpus
These papers cast a unique and valuable light on the development of the Province. The 92,000 printed pages of Parliamentary Debates are held by few institutions and they have no comprehensive subject index. Hence they have been inaccessible and difficult to use. This project, with the support of academics, archivists and politicians, has taken the Papers and fully digitised them. The resource has been available online since October 2006.
april 2011 by rybesh
18thConnect
april 2011 by rybesh
Digitized 18th-century texts.
digitization
18thcentury
sources
corpus
april 2011 by rybesh
Modernist Journals Project
april 2011 by rybesh
The Modernist Journals Project is a major resource for the study of modernism in the English-speaking world, with periodical literature as its central concern. Our primary mission is to produce digital editions of culturally significant magazines from around the early 20th century and make them freely available to the public on our website.
digitization
periodicals
modernism
sources
corpus
april 2011 by rybesh
Copy this bookmark: