rybesh + linguistics 29
Roy Harris and Integrational Linguistics
7 days ago by rybesh
Roy Harris is Emeritus Professor of General Linguistics in the University of Oxford and Honorary Fellow of St Edmund Hall. He has also held university teaching posts in Hong Kong, Boston and Paris and visiting fellowships at universities in South Africa and Australia, and at the Indian Institute of Advanced Study.
His books on integrationism, theory of communication, semiology and the history of linguistic thought include The Language Myth, Rethinking Writing, Saussure and his Interpreters,The Necessity of Artspeak, The Semantics of Science, Mindboggling, Rationality and the Literate Mind and After Epistemology.
linguistics
semiotics
His books on integrationism, theory of communication, semiology and the history of linguistic thought include The Language Myth, Rethinking Writing, Saussure and his Interpreters,The Necessity of Artspeak, The Semantics of Science, Mindboggling, Rationality and the Literate Mind and After Epistemology.
7 days ago by rybesh
The Epilogue that Started It All; or, Integrating LIS (Harris and Hjørland)
7 days ago by rybesh
This essay and bibliography will focus on the connections and possible overlap between, primarily, two prolific scholars, Roy Harris, emeritus professor of General Linguistics in the University of Oxford, founder of Integrationism, and Birger Hjørland, proponent of the socio-cognitive paradigm and domain analysis in Information Science (IS).
linguistics
KO
7 days ago by rybesh
Modeling the Evolution of Science
21 days ago by rybesh
This browseable 75-topic dynamic topic model of the Journal Science (1880-2002) is part of the on-line supplement to the submission "Modeling the Evolution of Science." This browser allows a user to visualize the dynamic topic model, and use the hidden topics that it has uncovered to guide an exploration of the original collection of documents.
linguistics
topicmodels
classification
science
libraries
21 days ago by rybesh
Peter Ludlow, "The Myth of Human Language" (2005)
5 weeks ago by rybesh
There is a core part of our linguistic competence that is fixed by biology (perhaps by low level biophysical principles), but this provides just a basic skeleton which is fleshed out in different ways on a conversation-by-conversation basis. To shift to a monetary metaphor, there are some common coins, but we also have the ability to mint new coins on the fly in collaboration with our discourse partners, to control which of those common coins are in circulation at any given time, and to coordinate and precisify the shared meanings of those common coins that are in use. As we will see, for most linguistic common coins the meaning is vastly underdetermined. I will suggest possible ways in which coins are minted and their values determined as discourse participants form dynamic communicative partnerships, resulting (if we really must deploy the term 'language') in what we might call micro-languages.
language
linguistics
meaning
semantics
5 weeks ago by rybesh
Latent Dirichlet Allocation in C
12 weeks ago by rybesh
This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data. LDA allows you to analyze of corpus, and extract the topics that combined to form its documents. For example, click here to see the topics estimated from a small corpus of Associated Press documents. LDA is fully described in Blei et al. (2003) .
This code contains:
an implementation of variational inference for the per-document topic proportions and per-word topic assignments
a variational EM procedure for estimating the topics and exchangeable Dirichlet hyperparameter
lda
c
linguistics
machinelearning
textanalysis
textmining
This code contains:
an implementation of variational inference for the per-document topic proportions and per-word topic assignments
a variational EM procedure for estimating the topics and exchangeable Dirichlet hyperparameter
12 weeks ago by rybesh
Natural Language Software Registry
february 2012 by rybesh
The Natural Language Software Registry (NLSR) is a concise summary of the capabilities and sources of a large amount of natural language processing (NLP) software available to the NLP community. It comprises academic, commercial and proprietary software with specifications and terms on which it can be acquired clearly indicated.
nlp
linguistics
tools
february 2012 by rybesh
William Labov
february 2012 by rybesh
He is noted for his seminal studies of the way ordinary people structure narrative stories of their own lives.
linguistics
narrative
discourse
february 2012 by rybesh
N-grams: corpus based (COCA, COHA, Spanish, Portuguese)
february 2012 by rybesh
These n-grams are based on the largest publicly-available, genre-balanced corpus of English -- the 425 million word Corpus of Contemporary American English (COCA). With this n-grams data (2, 3, 4, 5-word sequences, with their frequency), you can carry out powerful queries offline -- without needing to access the corpus via the web interface.
english
corpus
linguistics
nlp
ngrams
february 2012 by rybesh
Aravind K. Joshi - Towards Discourse Meaning
november 2011 by rybesh
The overall goal is to discuss some issues concerning the dependencies at the discourse level and at the sentence level. However, first I will briefly describe the Penn Discourse Treebank (PDTB)*, a corpus in which we annotate the discourse connectives (explicit and implicit) and their arguments together with "attributions" of the arguments and the relations denoted by the connectives, and also the senses of the connectives. I will then focus on the complexity of dependencies in terms of (a) the elements that bear the dependency relations, (b) graph theoretic properties of these dependencies such as nested and crossed dependencies, dependencies with shared arguments, and (c) attributions and their relationship to the dependencies, among others. I will compare these dependencies with those at the sentence level and discuss some issues that relate to the transition from the sentence level to the level of "immediate discourse" and propose some conjectures.
discourse
meaning
linguistics
nlp
november 2011 by rybesh
Model Theory (Stanford Encyclopedia of Philosophy)
november 2011 by rybesh
Model theory began with the study of formal languages and their interpretations, and of the kinds of classification that a particular formal language can make. Mainstream model theory is now a sophisticated branch of mathematics (see the entry on first-order model theory). But in a broader sense, model theory is the study of the interpretation of any language, formal or natural, by means of set-theoretic structures, with Alfred Tarski's truth definition as a paradigm. In this broader sense, model theory meets philosophy at several points, for example in the theory of logical consequence and in the semantics of natural languages.
logic
representation
language
interpretation
modeling
models
linguistics
november 2011 by rybesh
Mental Models Website
november 2011 by rybesh
The mental model theory of thinking and reasoning is the focus of this Web site. Mental models are representations in the mind of real or imaginary situations. Scientists sometimes use the term "mental model" as a synonym for "mental representation", but it has a narrower referent in the case of the theory of thinking and reasoning. The idea that people rely on mental models can be traced back to Kenneth Craik’s suggestion in 1943 that the mind constructs "small-scale models" of reality that it uses to anticipate events. Mental models can be constructed from perception, imagination, or the comprehension of discourse. They underlie visual images, but they can also be abstract, representing situations that cannot be visualised. Each mental model represents a possibility. Mental models are akin to architects' models or to physicists' diagrams in that their structure is analogous to the structure of the situation that they represent, unlike, say, the structure of logical forms used in formal rule theories. In this respect they are a little like pictures in the "picture" theory of language described by Ludwig Wittgenstein in 1922.
cogsci
psychology
linguistics
representation
november 2011 by rybesh
Martha Palmer | Projects | Verb Net
august 2011 by rybesh
VerbNet (VN) (Kipper-Schuler 2006) is the largest on-line verb lexicon currently available for English. It is a hierarchical domain-independent, broad-coverage verb lexicon with mappings to other lexical resources such as WordNet (Miller, 1990; Fellbaum, 1998), Xtag (XTAG Research Group, 2001), and FrameNet (Baker et al., 1998). VerbNet is organized into verb classes extending Levin (1993) classes through refinement and addition of subclasses to achieve syntactic and semantic coherence among members of a class. Each verb class in VN is completely described by thematic roles, selectional restrictions on the arguments, and frames consisting of a syntactic description and semantic predicates with a temporal function, in a manner similar to the event decomposition of Moens and Steedman (1988).
corpus
linguistics
nlp
language
data
frame
semantics
august 2011 by rybesh
Corpus-Based Study of Scientific Methodology: Comparing the Historical and Experimental Sciences
july 2011 by rybesh
This chapter studies the use of textual features based on systemic functional linguistics, for genre-based text categorization. We describe feature sets that represent different types of conjunctions and modal assessment, which together can partially indicate how different genres structure text and may prefer certain classes of attitudes towards propositions in the text. This enables analysis of large-scale rhetorical differences between genres by examining which features are important for classification. The specific domain we studied comprises scientific articles in historical and experimental sciences (paleontology and physical chemistry, respectively). We applied the SMO learning algorithm, which with our feature set achieved over 83% accuracy for classifying articles according to field, though no field-specific terms were used as features. The most highly-weighted features for each were consistent with hypothesized methodological differences between historical and experimental sciences, thus lending empirical evidence to the recent philosophical claim of multiple scientific methods.
nlp
rhetoric
science
history
language
genre
classification
linguistics
july 2011 by rybesh
Penn Treebank P.O.S. Tags
april 2011 by rybesh
Alphabetical list of part-of-speech tags used in the Penn Treebank Project.
linguistics
nlp
reference
april 2011 by rybesh
The Language of Public Discourse
july 2010 by rybesh
What does linguistics have to offer to the understanding of "public language," and vice-versa? What is the public, anyway? How does public language adapt to its material & social settings? What's the effect of new media on the language of public discourse?
linguistics
language
politics
discourse
newmedia
july 2010 by rybesh
NLP as a study of representations
november 2009 by rybesh
Ellen Riloff and I run an NLP reading group pretty much every semester. Last semester we covered "old school NLP." We independently came up with lists of what we consider some of the most important ideas (idea = paper) from pre-1990 (most are much earlier) and let students select which to present. There was a lot of overlap between Ellen's list and mine (not surprisingly). If people are interested, I can provide the whole list (just post a comment and I'll dig it up). The whole list of topics is posted as a comment. The topics that were actually selected are here.I hope the students have found this exercise useful. It gets you thinking about language in a way that papers from the 2000s typically do not. It brings up a bunch of issues that we no longer think about frequently. Like language. (Joking.) (Sort of.)One thing that's really stuck out for me is how much "old school" NLP comes across essentially as a study of representations. Perhaps this is a result of the fact that AI -- as a field -- was (and, to some degree, still is) enamored with knowledge representation problems. To be more concrete, let's look at a few examples. It's already been a while since I read these last (I had meant to write this post during the spring when things were fresh in my head), so please forgive me if I goof a few things up.I'll start with one I know well: Mann and Thompson's rhetorical structure theory paper from 1988. This is basically "the" RST paper. I think that when a many people think of RST, they think of it as a list of ways that sentences can be organized into hierarchies. Eg., this sentence provides background for that one, and together they argue in favor of yet a third. But this isn't really where RST begins. It begins by trying to understand the communicative role of text structure. That is, when I write, I am trying to communicate something. Everything that I write (if I'm writing "well") is toward that end. For instance, in this post, I'm trying to communicate that old school NLP views representation as the heart of the issue. This current paragraph is supporting that claim by providing a concrete example, which I am using to try to convince you of my claim.As a more detailed example, take the "Evidence" relation from RST. M+T have the following characterization of "Evidence." Herein, "N" is the nucleus of the relation, "S" is the satellite (think of these as sentences), "R" is the reader and "W" is the writer:relation name: Evidenceconstraints on N: R might not believe N to a degree satisfactory to Wconstraints on S: R believes S or will find it credibleconstraints on N+S: R's comprehending S increases R's belief of Nthe effect: R's belief of N is increasedlocus of effect: NThis is a totally different way from thinking about things than I think we see nowadays. I kind of liken it to how I tell students not to program. If you're implementing something moderately complex (say, forward/backward algorithm), first write down all the math, then start implementing. Don't start implementing first. I think nowadays (and sure, I'm guilty!) we see a lot of implementing without the math. Or rather, with plenty of math, but without a representational model of what it is that we're studying.The central claim of the RST paper is that one can think of texts as being organized into elementary discourse units, and these are connected into a tree structure by relations like the one above. (Or at least this is my reading of it.) That is, they have laid out a representation of text and claimed that this is how texts get put together.As a second example (this will be sorter), take Wendy Lehnert's 1982 paper, "Plot units and narrative summarization." Here, the story is about how stories get put together. The most interesting thing about the plot units model to me is that it breaks from how one might naturally think about stories. That is, I would naively think of a story as a series of events. The claim that Lehnert makes is that this is not the right way to think about it. Rather, we should think about stories as sequences of affect states. Effectively, an affect state is how a character is feeling at any time. (This isn't quite right, but it's close enough.) For example, Lehnert presents the following story:When John tried to start his care this morning, it wouldn't turn over. He asked his neighbor Paul for help. Paul did something to the carburetor and got it going. John thanked Paul and drove to work.The representation put forward for this story is something like: (1) negative-for-John (the car won't start), which leads to (2) motivation-for-John (to get it started, which leads to (3) positive-for-John (it's started), when then links back and resolves (1). You can also analyze the story from Paul's perspective, and then add links that go between the two characters showing how things interact. The rest of the paper describes how these relations work, and how they can be put together into more complex event sequences (such as "promised request bungled"). Again, a high level representation of how stories work from the perspective of the characters.So now I, W, hope that you, R, have an increased belief in the title of the post.Why do I think this is interesting? Because at this point, we know a lot about how to deal with structure in language. From a machine learning perspective, if you give me a structure and some data (and some features!), I will learn something. It can even be unsupervised if it makes you feel better. So in a sense, I think we're getting to a point where we can go back, look at some really hard problems, use the deep linguistic insights from two decades (or more) ago, and start taking a crack at things that are really deep. Of course, features are a big problem; as a very wise man once said to me: "Language is hard. The fact that statistical association mining at the word level made it appear easy for the past decade doesn't alter the basic truth. :-)." We've got many of the ingredients to start making progress, but it's not going to be easy!
linguistics
problems
community
discourse
structured_prediction
from google
november 2009 by rybesh
i d e a n t: Tag Literacy
july 2007 by rybesh
Distributed classification systems function at the intersection of individual choices and the shared linguistic/semantic norms of a social group (the folks in folksonomy).
social
metadata
categorization
classification
collaboration
linguistics
semantics
july 2007 by rybesh
Literary Encyclopedia: Langue and Parole
april 2007 by rybesh
Langue represents the “work of a collective intelligence”, which is both internal to each individual and beyond the will of any individual to change. Parole designates individual events of language use manifesting each time a speaker’s ephemeral ind
linguistics
theory
speech
language
april 2007 by rybesh
Chomsky: competence vs. performance
april 2007 by rybesh
Competence is our tacit, internalised knowledge of a language. Performance is external evidence of language competence, and is usage on particular occasions when factors other than our linguistic competence may affect its form.
linguistics
ideas
speech
theory
performance
knowledge
april 2007 by rybesh
Rhethorical Structure Theory
july 2006 by rybesh
RST is intended to describe texts, rather than the processes of creating or reading and understanding them. It posits various sorts of "building blocks" which can be observed to occur in texts.
linguistics
theory
semiotics
reference
july 2006 by rybesh
Philipp Cimiano
july 2006 by rybesh
Main interests are in the field of Computational Linguistics as well as Knowledge Representation.
people
academia
kr
linguistics
SSMS2006
semweb
july 2006 by rybesh
Mark Johnson, George Lakoff: Metaphors We Live By
june 2005 by rybesh
This book disappointed me, because I expected to be able to somehow apply or utilize the information within...
books
2003
urn:asin:0226468011
wishlist
concepts
languagearts
linguistics
metaphor
philosophy
truth
june 2005 by rybesh
George Lakoff: Women, Fire, and Dangerous Things
june 2005 by rybesh
I'd say it's a book I'll keep and likely use as a reference but I doubt I'll ever read the whole thing...
books
1990
urn:asin:0226468046
wishlist
categorization
cognition
languagearts
linguistics
psychology
reason
june 2005 by rybesh
Geoffrey Sampson: Writing Systems
june 2005 by rybesh
This book is the one that got me interested in writing systems as a part of linguistics...
books
1990
urn:asin:0804717567
wishlist
alphabet
language
languagearts
linguistics
writing
june 2005 by rybesh
James Fentress, Umberto Eco: The Search for the Perfect Language
june 2005 by rybesh
This is an excellent short review of European quest for a language to unite its disparate nations with each other and the rest of the world...
books
1997
urn:asin:0631205101
wishlist
europe
language
languagearts
linguistics
world
june 2005 by rybesh
related tags
academia ⊕ ai ⊕ alphabet ⊕ art ⊕ bayes ⊕ books ⊕ c ⊕ categorization ⊕ classification ⊕ cognition ⊕ cogsci ⊕ collaboration ⊕ communication ⊕ community ⊕ concepts ⊕ corpus ⊕ culturalstudies ⊕ data ⊕ discourse ⊕ english ⊕ europe ⊕ frame ⊕ genre ⊕ history ⊕ ideas ⊕ instruction ⊕ interpretation ⊕ java ⊕ knowledge ⊕ KO ⊕ kr ⊕ language ⊕ languagearts ⊕ lda ⊕ libraries ⊕ linguistics ⊖ literacy ⊕ literarytheory ⊕ logic ⊕ machinelearning ⊕ massmedia ⊕ meaning ⊕ mediastudies ⊕ metadata ⊕ metaphor ⊕ methods ⊕ modeling ⊕ models ⊕ music ⊕ narrative ⊕ newmedia ⊕ ngrams ⊕ nlp ⊕ people ⊕ performance ⊕ philosophy ⊕ politics ⊕ problems ⊕ psychology ⊕ reason ⊕ reference ⊕ representation ⊕ rhetoric ⊕ science ⊕ semantics ⊕ semiotics ⊕ semweb ⊕ social ⊕ sound ⊕ speech ⊕ SSMS2006 ⊕ structured_prediction ⊕ textanalysis ⊕ textmining ⊕ theory ⊕ tools ⊕ topicmodels ⊕ truth ⊕ urn:asin:0226468011 ⊕ urn:asin:0226468046 ⊕ urn:asin:031222530X ⊕ urn:asin:0340608773 ⊕ urn:asin:041525356X ⊕ urn:asin:0631205101 ⊕ urn:asin:0804717567 ⊕ wishlist ⊕ world ⊕ writing ⊕Copy this bookmark: