rybesh + linguistics   29

Roy Harris and Integrational Linguistics
Roy Harris is Emeritus Professor of General Linguistics in the University of Oxford and Honorary Fellow of St Edmund Hall. He has also held university teaching posts in Hong Kong, Boston and Paris and visiting fellowships at universities in South Africa and Australia, and at the Indian Institute of Advanced Study.

His books on integrationism, theory of communication, semiology and the history of linguistic thought include The Language Myth, Rethinking Writing, Saussure and his Interpreters,The Necessity of Artspeak, The Semantics of Science, Mindboggling, Rationality and the Literate Mind and After Epistemology.
linguistics  semiotics 
7 days ago by rybesh
The Epilogue that Started It All; or, Integrating LIS (Harris and Hjørland)
This essay and bibliography will focus on the connections and possible overlap between, primarily, two prolific scholars, Roy Harris, emeritus professor of General Linguistics in the University of Oxford, founder of Integrationism, and Birger Hjørland, proponent of the socio-cognitive paradigm and domain analysis in Information Science (IS).
linguistics  KO 
7 days ago by rybesh
Modeling the Evolution of Science
This browseable 75-topic dynamic topic model of the Journal Science (1880-2002) is part of the on-line supplement to the submission "Modeling the Evolution of Science." This browser allows a user to visualize the dynamic topic model, and use the hidden topics that it has uncovered to guide an exploration of the original collection of documents.
linguistics  topicmodels  classification  science  libraries 
21 days ago by rybesh
Peter Ludlow, "The Myth of Human Language" (2005)
There is a core part of our linguistic competence that is fixed by biology (perhaps by low level biophysical principles), but this provides just a basic skeleton which is fleshed out in different ways on a conversation-by-conversation basis. To shift to a monetary metaphor, there are some common coins, but we also have the ability to mint new coins on the fly in collaboration with our discourse partners, to control which of those common coins are in circulation at any given time, and to coordinate and precisify the shared meanings of those common coins that are in use. As we will see, for most linguistic common coins the meaning is vastly underdetermined. I will suggest possible ways in which coins are minted and their values determined as discourse participants form dynamic communicative partnerships, resulting (if we really must deploy the term 'language') in what we might call micro-languages.
language  linguistics  meaning  semantics 
5 weeks ago by rybesh
Latent Dirichlet Allocation in C
This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data. LDA allows you to analyze of corpus, and extract the topics that combined to form its documents. For example, click here to see the topics estimated from a small corpus of Associated Press documents. LDA is fully described in Blei et al. (2003) .

This code contains:

an implementation of variational inference for the per-document topic proportions and per-word topic assignments
a variational EM procedure for estimating the topics and exchangeable Dirichlet hyperparameter
lda  c  linguistics  machinelearning  textanalysis  textmining 
12 weeks ago by rybesh
Natural Language Software Registry
The Natural Language Software Registry (NLSR) is a concise summary of the capabilities and sources of a large amount of natural language processing (NLP) software available to the NLP community. It comprises academic, commercial and proprietary software with specifications and terms on which it can be acquired clearly indicated.
nlp  linguistics  tools 
february 2012 by rybesh
William Labov
He is noted for his seminal studies of the way ordinary people structure narrative stories of their own lives.
linguistics  narrative  discourse 
february 2012 by rybesh
N-grams: corpus based (COCA, COHA, Spanish, Portuguese)
These n-grams are based on the largest publicly-available, genre-balanced corpus of English -- the 425 million word Corpus of Contemporary American English (COCA). With this n-grams data (2, 3, 4, 5-word sequences, with their frequency), you can carry out powerful queries offline -- without needing to access the corpus via the web interface.
english  corpus  linguistics  nlp  ngrams 
february 2012 by rybesh
Aravind K. Joshi - Towards Discourse Meaning
The overall goal is to discuss some issues concerning the dependencies at the discourse level and at the sentence level. However, first I will briefly describe the Penn Discourse Treebank (PDTB)*, a corpus in which we annotate the discourse connectives (explicit and implicit) and their arguments together with "attributions" of the arguments and the relations denoted by the connectives, and also the senses of the connectives. I will then focus on the complexity of dependencies in terms of (a) the elements that bear the dependency relations, (b) graph theoretic properties of these dependencies such as nested and crossed dependencies, dependencies with shared arguments, and (c) attributions and their relationship to the dependencies, among others. I will compare these dependencies with those at the sentence level and discuss some issues that relate to the transition from the sentence level to the level of "immediate discourse" and propose some conjectures.
discourse  meaning  linguistics  nlp 
november 2011 by rybesh
Model Theory (Stanford Encyclopedia of Philosophy)
Model theory began with the study of formal languages and their interpretations, and of the kinds of classification that a particular formal language can make. Mainstream model theory is now a sophisticated branch of mathematics (see the entry on first-order model theory). But in a broader sense, model theory is the study of the interpretation of any language, formal or natural, by means of set-theoretic structures, with Alfred Tarski's truth definition as a paradigm. In this broader sense, model theory meets philosophy at several points, for example in the theory of logical consequence and in the semantics of natural languages.
logic  representation  language  interpretation  modeling  models  linguistics 
november 2011 by rybesh
Mental Models Website
The mental model theory of thinking and reasoning is the focus of this Web site. Mental models are representations in the mind of real or imaginary situations. Scientists sometimes use the term "mental model" as a synonym for "mental representation", but it has a narrower referent in the case of the theory of thinking and reasoning. The idea that people rely on mental models can be traced back to Kenneth Craik’s suggestion in 1943 that the mind constructs "small-scale models" of reality that it uses to anticipate events. Mental models can be constructed from perception, imagination, or the comprehension of discourse. They underlie visual images, but they can also be abstract, representing situations that cannot be visualised. Each mental model represents a possibility. Mental models are akin to architects' models or to physicists' diagrams in that their structure is analogous to the structure of the situation that they represent, unlike, say, the structure of logical forms used in formal rule theories. In this respect they are a little like pictures in the "picture" theory of language described by Ludwig Wittgenstein in 1922.
cogsci  psychology  linguistics  representation 
november 2011 by rybesh
Martha Palmer | Projects | Verb Net
VerbNet (VN) (Kipper-Schuler 2006) is the largest on-line verb lexicon currently available for English. It is a hierarchical domain-independent, broad-coverage verb lexicon with mappings to other lexical resources such as WordNet (Miller, 1990; Fellbaum, 1998), Xtag (XTAG Research Group, 2001), and FrameNet (Baker et al., 1998). VerbNet is organized into verb classes extending Levin (1993) classes through refinement and addition of subclasses to achieve syntactic and semantic coherence among members of a class. Each verb class in VN is completely described by thematic roles, selectional restrictions on the arguments, and frames consisting of a syntactic description and semantic predicates with a temporal function, in a manner similar to the event decomposition of Moens and Steedman (1988).
corpus  linguistics  nlp  language  data  frame  semantics 
august 2011 by rybesh
Corpus-Based Study of Scientific Methodology: Comparing the Historical and Experimental Sciences
This chapter studies the use of textual features based on systemic functional linguistics, for genre-based text categorization. We describe feature sets that represent different types of conjunctions and modal assessment, which together can partially indicate how different genres structure text and may prefer certain classes of attitudes towards propositions in the text. This enables analysis of large-scale rhetorical differences between genres by examining which features are important for classification. The specific domain we studied comprises scientific articles in historical and experimental sciences (paleontology and physical chemistry, respectively). We applied the SMO learning algorithm, which with our feature set achieved over 83% accuracy for classifying articles according to field, though no field-specific terms were used as features. The most highly-weighted features for each were consistent with hypothesized methodological differences between historical and experimental sciences, thus lending empirical evidence to the recent philosophical claim of multiple scientific methods.
nlp  rhetoric  science  history  language  genre  classification  linguistics 
july 2011 by rybesh
Penn Treebank P.O.S. Tags
Alphabetical list of part-of-speech tags used in the Penn Treebank Project.
linguistics  nlp  reference 
april 2011 by rybesh
The Language of Public Discourse
What does linguistics have to offer to the understanding of "public language," and vice-versa? What is the public, anyway? How does public language adapt to its material & social settings? What's the effect of new media on the language of public discourse?
linguistics  language  politics  discourse  newmedia 
july 2010 by rybesh
NLP as a study of representations
Ellen Riloff and I run an NLP reading group pretty much every semester. Last semester we covered "old school NLP." We independently came up with lists of what we consider some of the most important ideas (idea = paper) from pre-1990 (most are much earlier) and let students select which to present. There was a lot of overlap between Ellen's list and mine (not surprisingly). If people are interested, I can provide the whole list (just post a comment and I'll dig it up). The whole list of topics is posted as a comment. The topics that were actually selected are here.I hope the students have found this exercise useful. It gets you thinking about language in a way that papers from the 2000s typically do not. It brings up a bunch of issues that we no longer think about frequently. Like language. (Joking.) (Sort of.)One thing that's really stuck out for me is how much "old school" NLP comes across essentially as a study of representations. Perhaps this is a result of the fact that AI -- as a field -- was (and, to some degree, still is) enamored with knowledge representation problems. To be more concrete, let's look at a few examples. It's already been a while since I read these last (I had meant to write this post during the spring when things were fresh in my head), so please forgive me if I goof a few things up.I'll start with one I know well: Mann and Thompson's rhetorical structure theory paper from 1988. This is basically "the" RST paper. I think that when a many people think of RST, they think of it as a list of ways that sentences can be organized into hierarchies. Eg., this sentence provides background for that one, and together they argue in favor of yet a third. But this isn't really where RST begins. It begins by trying to understand the communicative role of text structure. That is, when I write, I am trying to communicate something. Everything that I write (if I'm writing "well") is toward that end. For instance, in this post, I'm trying to communicate that old school NLP views representation as the heart of the issue. This current paragraph is supporting that claim by providing a concrete example, which I am using to try to convince you of my claim.As a more detailed example, take the "Evidence" relation from RST. M+T have the following characterization of "Evidence." Herein, "N" is the nucleus of the relation, "S" is the satellite (think of these as sentences), "R" is the reader and "W" is the writer:relation name: Evidenceconstraints on N: R might not believe N to a degree satisfactory to Wconstraints on S: R believes S or will find it credibleconstraints on N+S: R's comprehending S increases R's belief of Nthe effect: R's belief of N is increasedlocus of effect: NThis is a totally different way from thinking about things than I think we see nowadays. I kind of liken it to how I tell students not to program. If you're implementing something moderately complex (say, forward/backward algorithm), first write down all the math, then start implementing. Don't start implementing first. I think nowadays (and sure, I'm guilty!) we see a lot of implementing without the math. Or rather, with plenty of math, but without a representational model of what it is that we're studying.The central claim of the RST paper is that one can think of texts as being organized into elementary discourse units, and these are connected into a tree structure by relations like the one above. (Or at least this is my reading of it.) That is, they have laid out a representation of text and claimed that this is how texts get put together.As a second example (this will be sorter), take Wendy Lehnert's 1982 paper, "Plot units and narrative summarization." Here, the story is about how stories get put together. The most interesting thing about the plot units model to me is that it breaks from how one might naturally think about stories. That is, I would naively think of a story as a series of events. The claim that Lehnert makes is that this is not the right way to think about it. Rather, we should think about stories as sequences of affect states. Effectively, an affect state is how a character is feeling at any time. (This isn't quite right, but it's close enough.) For example, Lehnert presents the following story:When John tried to start his care this morning, it wouldn't turn over. He asked his neighbor Paul for help. Paul did something to the carburetor and got it going. John thanked Paul and drove to work.The representation put forward for this story is something like: (1) negative-for-John (the car won't start), which leads to (2) motivation-for-John (to get it started, which leads to (3) positive-for-John (it's started), when then links back and resolves (1). You can also analyze the story from Paul's perspective, and then add links that go between the two characters showing how things interact. The rest of the paper describes how these relations work, and how they can be put together into more complex event sequences (such as "promised request bungled"). Again, a high level representation of how stories work from the perspective of the characters.So now I, W, hope that you, R, have an increased belief in the title of the post.Why do I think this is interesting? Because at this point, we know a lot about how to deal with structure in language. From a machine learning perspective, if you give me a structure and some data (and some features!), I will learn something. It can even be unsupervised if it makes you feel better. So in a sense, I think we're getting to a point where we can go back, look at some really hard problems, use the deep linguistic insights from two decades (or more) ago, and start taking a crack at things that are really deep. Of course, features are a big problem; as a very wise man once said to me: "Language is hard. The fact that statistical association mining at the word level made it appear easy for the past decade doesn't alter the basic truth. :-)." We've got many of the ingredients to start making progress, but it's not going to be easy!
linguistics  problems  community  discourse  structured_prediction  from google
november 2009 by rybesh
i d e a n t: Tag Literacy
Distributed classification systems function at the intersection of individual choices and the shared linguistic/semantic norms of a social group (the folks in folksonomy).
social  metadata  categorization  classification  collaboration  linguistics  semantics 
july 2007 by rybesh
Literary Encyclopedia: Langue and Parole
Langue represents the “work of a collective intelligence”, which is both internal to each individual and beyond the will of any individual to change. Parole designates individual events of language use manifesting each time a speaker’s ephemeral ind
linguistics  theory  speech  language 
april 2007 by rybesh
Chomsky: competence vs. performance
Competence is our tacit, internalised knowledge of a language. Performance is external evidence of language competence, and is usage on particular occasions when factors other than our linguistic competence may affect its form.
linguistics  ideas  speech  theory  performance  knowledge 
april 2007 by rybesh
Rhethorical Structure Theory
RST is intended to describe texts, rather than the processes of creating or reading and understanding them. It posits various sorts of "building blocks" which can be observed to occur in texts.
linguistics  theory  semiotics  reference 
july 2006 by rybesh
Philipp Cimiano
Main interests are in the field of Computational Linguistics as well as Knowledge Representation.
people  academia  kr  linguistics  SSMS2006  semweb 
july 2006 by rybesh
Mallet
MALLET is an integrated collection of Java code useful for statistical natural language processing, document classification, clustering, information extraction, and other machine learning applications to text.
ai  bayes  java  linguistics  nlp  tools 
july 2005 by rybesh
Mark Johnson, George Lakoff: Metaphors We Live By
This book disappointed me, because I expected to be able to somehow apply or utilize the information within...
books  2003  urn:asin:0226468011  wishlist  concepts  languagearts  linguistics  metaphor  philosophy  truth 
june 2005 by rybesh
George Lakoff: Women, Fire, and Dangerous Things
I'd say it's a book I'll keep and likely use as a reference but I doubt I'll ever read the whole thing...
books  1990  urn:asin:0226468046  wishlist  categorization  cognition  languagearts  linguistics  psychology  reason 
june 2005 by rybesh
Geoffrey Sampson: Writing Systems
This book is the one that got me interested in writing systems as a part of linguistics...
books  1990  urn:asin:0804717567  wishlist  alphabet  language  languagearts  linguistics  writing 
june 2005 by rybesh
James Fentress, Umberto Eco: The Search for the Perfect Language
This is an excellent short review of European quest for a language to unite its disparate nations with each other and the rest of the world...
books  1997  urn:asin:0631205101  wishlist  europe  language  languagearts  linguistics  world 
june 2005 by rybesh

Copy this bookmark:



description:


tags: