rybesh + corpus   8

National Archeological Database, Reports module
The National Archeological Database, Reports module, is an expanded bibliographic inventory of over 350,000 reports on archeological investigation and planning, mostly of limited circulation. This "gray literature" represents a large portion of the primary information available on archeological sites in the U.S.
archaeology  corpus  timeperiods  bibliography 
5 weeks ago by rybesh
N-grams: corpus based (COCA, COHA, Spanish, Portuguese)
These n-grams are based on the largest publicly-available, genre-balanced corpus of English -- the 425 million word Corpus of Contemporary American English (COCA). With this n-grams data (2, 3, 4, 5-word sequences, with their frequency), you can carry out powerful queries offline -- without needing to access the corpus via the web interface.
english  corpus  linguistics  nlp  ngrams 
february 2012 by rybesh
Martha Palmer | Projects | Verb Net
VerbNet (VN) (Kipper-Schuler 2006) is the largest on-line verb lexicon currently available for English. It is a hierarchical domain-independent, broad-coverage verb lexicon with mappings to other lexical resources such as WordNet (Miller, 1990; Fellbaum, 1998), Xtag (XTAG Research Group, 2001), and FrameNet (Baker et al., 1998). VerbNet is organized into verb classes extending Levin (1993) classes through refinement and addition of subclasses to achieve syntactic and semantic coherence among members of a class. Each verb class in VN is completely described by thematic roles, selectional restrictions on the arguments, and frames consisting of a syntactic description and semantic predicates with a temporal function, in a manner similar to the event decomposition of Moens and Steedman (1988).
corpus  linguistics  nlp  language  data  frame  semantics 
august 2011 by rybesh
ec2 – loc-ndnp
To quickly set up your own instance of Chronicling America we've created a public Amazon Machine Image (AMI). This AMI has the Chronicling America software stack already installed, so you can immediately start loading batches. We also have put some real NDNP batches on Elastic Block Storage (EBS) which you can mount and load from.
aws  cloud  newspaper  digitization  corpus 
july 2011 by rybesh
Google Books: American English (155 billion words)
This interface allows you to search the Google Books data in many ways that are much more advanced than what is possible with the simple Google Books interface. You can search by word, phrase, substring, lemma, part of speech, synonyms, and collocates (nearby words). You can copy the data to other applications for further analysis, which you can't do with the regular Google Books interface. And you can quickly and easily compare the data in two different sections of the corpus (for example, adjectives describing women or art or music in the 1960s-2000s vs the 1870s-1910s).
american  books  corpus  data  statistics  language 
may 2011 by rybesh
The Stormont Papers - Home
This website offers access to the Parliamentary Debates of the devolved government of Northern Ireland from June 7 1921 to the dissolution of Parliament in March 28 1972.

These papers cast a unique and valuable light on the development of the Province. The 92,000 printed pages of Parliamentary Debates are held by few institutions and they have no comprehensive subject index. Hence they have been inaccessible and difficult to use. This project, with the support of academics, archivists and politicians, has taken the Papers and fully digitised them. The resource has been available online since October 2006.
britain  ireland  history  corpus 
april 2011 by rybesh
18thConnect
Digitized 18th-century texts.
digitization  18thcentury  sources  corpus 
april 2011 by rybesh
Modernist Journals Project
The Modernist Journals Project is a major resource for the study of modernism in the English-speaking world, with periodical literature as its central concern. Our primary mission is to produce digital editions of culturally significant magazines from around the early 20th century and make them freely available to the public on our website.
digitization  periodicals  modernism  sources  corpus 
april 2011 by rybesh

Copy this bookmark:



description:


tags: