N-grams: corpus based (COCA, COHA, Spanish, Portuguese)
february 2012 by rybesh
These n-grams are based on the largest publicly-available, genre-balanced corpus of English -- the 425 million word Corpus of Contemporary American English (COCA). With this n-grams data (2, 3, 4, 5-word sequences, with their frequency), you can carry out powerful queries offline -- without needing to access the corpus via the web interface.
english
corpus
linguistics
nlp
ngrams
february 2012 by rybesh