cshalizi + linguistics 51
[1203.6360] You had me at hello: How phrasing affects memorability
8 weeks ago by cshalizi
"Understanding the ways in which information achieves widespread public awareness is a research question of significant interest. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. To this end, we develop an analysis framework and build a corpus of movie quotes, annotated with memorability information, in which we are able to control for both the speaker and the setting of the quotes. We find significant differences between memorable and non-memorable quotes in several key dimensions. One is lexical distinctiveness: in aggregate, memorable quotes use less common word choices, but at the same time are built upon a scaffolding of common syntactic patterns; another is that memorable quotes tend to be more general in ways that make them easy to apply in new contexts. We also show how the concept of "memorable language" can be extended across domains."
to:NB
linguistics
statistics
cultural_evolution
8 weeks ago by cshalizi
Votes and Vowels: A Changing Accent Shows How Language Parallels Politics | The Crux | Discover Magazine
8 weeks ago by cshalizi
As Henry says, se non e vero, e bene trovato.
linguistics
political_science
us_politics
sociolinguistics
labov.william
social_networks
via:henry_farrell
8 weeks ago by cshalizi
Language Log » Keith Chen, Whorfian economist
february 2012 by cshalizi
"I also worry that it is too easy to find correlations of this kind, and we don't have any idea just how easy until a concerted effort has been made to show that the spurious ones are not supportable. For example, if we took "has (vs. does not have) pharyngeal consonants", or "uses (vs. does not use) close front rounded vowels", would we find correlations there too? I have some colleagues here at the University of Edinburgh, within Simon Kirby's research group, who have run some informal experiments on the data Chen uses to see if dredging up spurious correlations of this kind is easy or hard, and so far they have found it jaw-droppingly easy. (I won't say any more, because I am in the weird position of producing unrefereed telegraphing of unrefereed and informal objections to an unrefereed and unpublished working paper, and it's all getting a bit too weird for me.)"
How many languages are there in Europe? Order of 10^2. How many variables can an economist get cross-country data on? Again, order of 10^2. How many discriminable syntactic features do languages have? Easily order of 10^3 if not come. Conclusion: this is not what I mean when I say that economists should do more data-mining.
economics
bad_data_analysis
linguistics
pullum.geoff
How many languages are there in Europe? Order of 10^2. How many variables can an economist get cross-country data on? Again, order of 10^2. How many discriminable syntactic features do languages have? Easily order of 10^3 if not come. Conclusion: this is not what I mean when I say that economists should do more data-mining.
february 2012 by cshalizi
Language Log » Phonemic Serial Founder Effect disconfirmed
february 2012 by cshalizi
Massively-hyped paper trying to model language history using lightly-repurposed biological models comprehensively debunked by very careful and linguistically-informed data analysis. One of the authors of the debunking shows up in the comments, and says:
"Finally, regarding press; a few news organisations were interested in the initial pitch, but lost interest when they realised that we didn't have a good story here about human origins."
linguistics
language_history
evolutionary_biology
human_genetics
evisceration
bad_science_journalism
"Finally, regarding press; a few news organisations were interested in the initial pitch, but lost interest when they realised that we didn't have a good story here about human origins."
february 2012 by cshalizi
The structure of science information (Harris, 2002)
december 2011 by cshalizi
"The organization of information within science can be investigated in a principled way through analysis of science language. The restricted use of language in science enables description of the informational structure of science and of particular subfields, with strong similarities to structures in mathematics and programming languages. This result rests on decades of research into the relation between form and content in language, based on an information-theoretic approach to the structure of information. Examples are provided from immunology and the social sciences. Practical applications include storage of science information in databases, indexing the literature, and identification and resolution of controversy."
to:NB
linguistics
text_mining
natural_language_processing
harris.zellig
information_retrieval
december 2011 by cshalizi
Cortical representation of the constituent structure of sentences — PNAS
february 2011 by cshalizi
Dehaene is a good scholar, but the abstract is very surprising to me, and the fact that it's contributed to PNAS (rather than properly refereed) is a Bad Sign.
neuropsychology
linguistics
to:NB
fmri
dehaene.stanislas
february 2011 by cshalizi
Is there a language instinct?
may 2010 by cshalizi
New-ish BBS article claiming there isn't any universal grammar, merely several stable strategies in a sort of evolutionary game.
linguistics
linguistic_universals
linguistic_evolution
cognitive_science
track_down_references
cultural_evolution
may 2010 by cshalizi
Language Log » Pictish writing?
april 2010 by cshalizi
Yet another methodological EPIC FAIL.
linguistics
entropy
information_theory
pictish
bad_data_analysis
april 2010 by cshalizi
Remembering and Forgetting: Ideologies of Language Loss in a Northern Italian Town
august 2009 by cshalizi
"Speakers' stories about language shift in Bergamo, a Northern Italian town, are reflective of larger sociocultural changes and negotiations over tradition. The article examines how Bergamasco residents conceptualize time to construct a past of poverty in contrast to a prosperous contemporary world, taking up various affective stances toward these socioeconomic and linguistic shifts. Nostalgia plays a central role in speakers' focus on certain socioeconomic shifts. Analyzing temporality in and through linguistic ideology, the article contributes to debates on language shift, cultural change, and socioeconomic transformation." --- My mother grew up in Bergamo. (Though they moved there when she was ~10.)
italy
language_history
anthropology
ethnography
ideology
bergamo
linguistics
via:languagelog
nostalgia
tradition
uses_of_the_past
escaping_the_idiocy_of_rural_life
august 2009 by cshalizi
ReadMe: Software for Automated Text Analysis
june 2009 by cshalizi
"The ReadMe software package for R takes as input a set of text documents (such as speeches, blog posts, newspaper articles, judicial opinions, movie reviews, etc.), a categorization scheme chosen by the user (e.g., ordered positive to negative sentiment ratings, unordered policy topics, or any other mutually exclusive and exhaustive set of categories), and a small subset of text documents hand classified into the given categories. If used properly, ReadMe will report, normally within sampling error of the truth, the proportion of documents within each of the given categories among those not hand coded. ReadMe computes quantities of interest to the scientific community based on the distribution within categories but does so by skipping the more error prone intermediate step of classifing individual documents. Other procedures are also included to make processing text easy."
to_teach:data-mining
text_mining
content_analysis
R
software
linguistics
statistics
via:chl
king.gary
june 2009 by cshalizi
Carl de Marcken: Unsupervised Language Acquisition
june 2009 by cshalizi
I have been meaning to read this since 1999 or so.
machine_learning
grammar_induction
language_acquisition
linguistics
via:slaniel
june 2009 by cshalizi
An improved statistical test for historical linguistics
june 2009 by cshalizi
"Historical linguistics needs procedures to evaluate the similarity between languages through the comparison of specific word lists drawn from the whole vocabulary. The main issue is to evaluate a fair threshold for the number of similar items beyond which it is sensible to reject the hypothesis of chance similarity. After a short review of papers dealing with that problem, in this paper an extension of those methods is proposed which exploits available data in a more efficient way. In particular, the exact distribution of the new test statistics is calculated and the power of the new procedure is compared with the power of the existing method."
statistics
linguistics
historical_linguistics
to:NB
to_read
june 2009 by cshalizi
Language Log » Conditional entropy and the Indus Script
april 2009 by cshalizi
Having read the papers and re-implemented the method, I now concur with Mark: this is an methodological EPIC FAIL.
information_theory
linguistics
bad_data_analysis
bad_science_journalism
harrapan_civilization
liberman.mark
blogged
why_oh_why_cant_we_have_a_better_academic_publishing_system
april 2009 by cshalizi
languagehat.com: DAVID FOSTER WALLACE DEMOLISHED.
april 2009 by cshalizi
Later: I agree with a correspondent that "demolished" is too strong, but the key observation is that Wallace completely fails to get _why_ linguists take the position they do, which is not some perverse antinomianism but that they want to understand what people actually do.
enjoyable_rants
linguistics
wallace.david_foster
april 2009 by cshalizi
50 Years of Stupid Grammar Advice - ChronicleReview.com
april 2009 by cshalizi
Geof. Pullum blasts Strunk and White. But you read languagelog already.
evisceration
linguistics
writing_advice
strunk_and_white
funny:academic
pullum.geoffrey
book_reviews
april 2009 by cshalizi
Intentional Vagueness (Blume and Board)
february 2009 by cshalizi
"This paper analyzes communication with a language that is vague in the sense that
identical messages do not always result in identical interpretations. It is shown that
strategic agents frequently add to this vagueness by being intentionally vague, i.e. they
deliberately choose less precise messages than they have to among the ones available
to them in equilibrium. Having to communicate with a vague language can be welfare
enhancing because it mitigates conflict. In equilibria that satisfy a dynamic stability
condition intentional vagueness increases with the degree of conflict between sender
and receiver."
linguistics
pragmatics
game_theory
vagueness
blume.andreas
to:NB
to_read
identical messages do not always result in identical interpretations. It is shown that
strategic agents frequently add to this vagueness by being intentionally vague, i.e. they
deliberately choose less precise messages than they have to among the ones available
to them in equilibrium. Having to communicate with a vague language can be welfare
enhancing because it mitigates conflict. In equilibria that satisfy a dynamic stability
condition intentional vagueness increases with the degree of conflict between sender
and receiver."
february 2009 by cshalizi
Language Log » The directed graph of stereotypical incomprehensibility
january 2009 by cshalizi
The central role of Chinese in this graph is highly suggestive.
linguistics
funny:geeky
liberman.mark
networks
january 2009 by cshalizi
Language Log: Comparing communication efficiency across languages
april 2008 by cshalizi
"English texts are larger than their Chinese counterparts by a factor of between 1.37 and 2.27 before compression, or 1.19 to 1.41 after compression."
liberman.mark
linguistics
data_compression
april 2008 by cshalizi
Log -- David Chess, 27 February 2008
february 2008 by cshalizi
"Maybe it wouldn't have been so bad if we hadn't been each other's First Contacts. Virgin civilizations, groping each other in the dark."
science_fiction
linguistics
mind-games
funny:geeky
linguistic_relativity
suicide
february 2008 by cshalizi
A Thousand Years of Nonlinear History - DeLanda (@Labyrinth)
february 2008 by cshalizi
Surprisingly sane; notes at http://bactra.org/weblog/algae-2006-05.html
delanda.manuel
world_history
great_transformation
linguistics
language_history
globalization
cities
institutions
memes
complexity
materialism
philosophy
emergence
economics
economic_history
books:recommended
february 2008 by cshalizi
Workshop I: Dynamic Searches and Knowledge Building
november 2007 by cshalizi
IPAM workshop on the mathematics of search and knowledge discovery, with links to slides and/or audio for some talks
information_retrieval
machine_learning
data_mining
linguistics
natural_language_processing
via:klk
semantics_from_syntax
november 2007 by cshalizi
Relevance: Communication and Cognition - Sperber and Wilson (@Labyrinth)
november 2007 by cshalizi
Sperber & Wilson's excellent book, on sale
wilson.deirdre
relevance
pragmatics
linguistics
cognition
semiotics
books:recommended
sperber.dan
november 2007 by cshalizi
Mind Hacks: Osama Bin Language Acquistion
october 2007 by cshalizi
"And I tell you, artificial intelligence is a false god that provides correlative and not causal models of language acquisition. The infallible methodologies are the comparative study of world languages and lesion analyses of those who must be treated wit
linguistics
funny:academic
taste:bad
october 2007 by cshalizi
related tags
abstraction ⊕ analogy ⊕ anthropology ⊕ bad_data_analysis ⊕ bad_science_journalism ⊕ bergamo ⊕ blogged ⊕ blume.andreas ⊕ books:noted ⊕ books:recommended ⊕ book_reviews ⊕ cities ⊕ classifiers ⊕ cognition ⊕ cognitive_science ⊕ collective_cognition ⊕ complexity ⊕ concepts ⊕ content_analysis ⊕ corporations ⊕ cultural_criticism ⊕ cultural_evolution ⊕ data_compression ⊕ data_mining ⊕ debunking ⊕ dehaene.stanislas ⊕ delanda.manuel ⊕ discourse_analysis ⊕ distributed_systems ⊕ dyslexia ⊕ economics ⊕ economic_history ⊕ emergence ⊕ english ⊕ enjoyable_rants ⊕ entropy ⊕ escaping_the_idiocy_of_rural_life ⊕ ethnography ⊕ evisceration ⊕ evolutionary_biology ⊕ evolution_of_cognition ⊕ experimental_biology ⊕ experimental_psychology ⊕ finance ⊕ fish.stanley ⊕ fmri ⊕ foxp2 ⊕ funny:academic ⊕ funny:geeky ⊕ game_theory ⊕ genetics ⊕ gestures ⊕ globalization ⊕ grammar_induction ⊕ great_transformation ⊕ harrapan_civilization ⊕ harris.zellig ⊕ historical_linguistics ⊕ history_of_science ⊕ hobbs.jerry ⊕ human_genetics ⊕ ideology ⊕ imperialism ⊕ indo-european ⊕ information_retrieval ⊕ information_theory ⊕ institutions ⊕ interpretation ⊕ in_NB ⊕ irony ⊕ italy ⊕ japanese ⊕ jones.william ⊕ kempson.ruth ⊕ king.gary ⊕ labov.william ⊕ lakoff.george ⊕ language_acquisition ⊕ language_evolution ⊕ language_history ⊕ liberman.mark ⊕ lie_detection ⊕ linguistics ⊖ linguistic_evolution ⊕ linguistic_relativity ⊕ linguistic_universals ⊕ lives_of_the_scholars ⊕ logistic_regression ⊕ machine_learning ⊕ machine_translation ⊕ markov_models ⊕ materialism ⊕ memes ⊕ metaphor ⊕ methodological_EPIC_FAIL ⊕ mind-games ⊕ model_selection ⊕ natural_language_processing ⊕ networks ⊕ neuropsychology ⊕ neuroscience ⊕ nielsen.michael ⊕ nostalgia ⊕ nunberg.geoff ⊕ perceptron ⊕ philosophy ⊕ pictish ⊕ political_science ⊕ pragmatics ⊕ primates ⊕ progressive_forces ⊕ psychoceramics ⊕ pullum.geoff ⊕ pullum.geoffrey ⊕ R ⊕ rare_events ⊕ re:AoS_project ⊕ recursion ⊕ relevance ⊕ science_fiction ⊕ semantics ⊕ semantics_from_syntax ⊕ semiotics ⊕ sex_differences ⊕ social_networks ⊕ sociolinguistics ⊕ software ⊕ sperber.dan ⊕ statistical_inference_for_stochastic_processes ⊕ statistics ⊕ strunk_and_white ⊕ suicide ⊕ taste:bad ⊕ text_mining ⊕ to:NB ⊕ to_read ⊕ to_teach:data-mining ⊕ track_down_references ⊕ tradition ⊕ uses_of_the_past ⊕ us_politics ⊕ vagueness ⊕ via:arthegall ⊕ via:chl ⊕ via:guslacerda ⊕ via:henry_farrell ⊕ via:klk ⊕ via:languagelog ⊕ via:slaniel ⊕ wallace.david_foster ⊕ why_oh_why_cant_we_have_a_better_academic_publishing_system ⊕ wilson.deirdre ⊕ world_history ⊕ writing_advice ⊕ zionism ⊕Copy this bookmark: