cshalizi + linguistics   51

[1203.6360] You had me at hello: How phrasing affects memorability
"Understanding the ways in which information achieves widespread public awareness is a research question of significant interest. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. To this end, we develop an analysis framework and build a corpus of movie quotes, annotated with memorability information, in which we are able to control for both the speaker and the setting of the quotes. We find significant differences between memorable and non-memorable quotes in several key dimensions. One is lexical distinctiveness: in aggregate, memorable quotes use less common word choices, but at the same time are built upon a scaffolding of common syntactic patterns; another is that memorable quotes tend to be more general in ways that make them easy to apply in new contexts. We also show how the concept of "memorable language" can be extended across domains."
to:NB  linguistics  statistics  cultural_evolution 
8 weeks ago by cshalizi
Language Log » Keith Chen, Whorfian economist
"I also worry that it is too easy to find correlations of this kind, and we don't have any idea just how easy until a concerted effort has been made to show that the spurious ones are not supportable. For example, if we took "has (vs. does not have) pharyngeal consonants", or "uses (vs. does not use) close front rounded vowels", would we find correlations there too? I have some colleagues here at the University of Edinburgh, within Simon Kirby's research group, who have run some informal experiments on the data Chen uses to see if dredging up spurious correlations of this kind is easy or hard, and so far they have found it jaw-droppingly easy. (I won't say any more, because I am in the weird position of producing unrefereed telegraphing of unrefereed and informal objections to an unrefereed and unpublished working paper, and it's all getting a bit too weird for me.)"

How many languages are there in Europe? Order of 10^2. How many variables can an economist get cross-country data on? Again, order of 10^2. How many discriminable syntactic features do languages have? Easily order of 10^3 if not come. Conclusion: this is not what I mean when I say that economists should do more data-mining.
economics  bad_data_analysis  linguistics  pullum.geoff 
february 2012 by cshalizi
Language Log » Phonemic Serial Founder Effect disconfirmed
Massively-hyped paper trying to model language history using lightly-repurposed biological models comprehensively debunked by very careful and linguistically-informed data analysis. One of the authors of the debunking shows up in the comments, and says:
"Finally, regarding press; a few news organisations were interested in the initial pitch, but lost interest when they realised that we didn't have a good story here about human origins."
linguistics  language_history  evolutionary_biology  human_genetics  evisceration  bad_science_journalism 
february 2012 by cshalizi
The structure of science information (Harris, 2002)
"The organization of information within science can be investigated in a principled way through analysis of science language. The restricted use of language in science enables description of the informational structure of science and of particular subfields, with strong similarities to structures in mathematics and programming languages. This result rests on decades of research into the relation between form and content in language, based on an information-theoretic approach to the structure of information. Examples are provided from immunology and the social sciences. Practical applications include storage of science information in databases, indexing the literature, and identification and resolution of controversy."
to:NB  linguistics  text_mining  natural_language_processing  harris.zellig  information_retrieval 
december 2011 by cshalizi
Cortical representation of the constituent structure of sentences — PNAS
Dehaene is a good scholar, but the abstract is very surprising to me, and the fact that it's contributed to PNAS (rather than properly refereed) is a Bad Sign.
neuropsychology  linguistics  to:NB  fmri  dehaene.stanislas 
february 2011 by cshalizi
Is there a language instinct?
New-ish BBS article claiming there isn't any universal grammar, merely several stable strategies in a sort of evolutionary game.
linguistics  linguistic_universals  linguistic_evolution  cognitive_science  track_down_references  cultural_evolution 
may 2010 by cshalizi
Remembering and Forgetting: Ideologies of Language Loss in a Northern Italian Town
"Speakers' stories about language shift in Bergamo, a Northern Italian town, are reflective of larger sociocultural changes and negotiations over tradition. The article examines how Bergamasco residents conceptualize time to construct a past of poverty in contrast to a prosperous contemporary world, taking up various affective stances toward these socioeconomic and linguistic shifts. Nostalgia plays a central role in speakers' focus on certain socioeconomic shifts. Analyzing temporality in and through linguistic ideology, the article contributes to debates on language shift, cultural change, and socioeconomic transformation." --- My mother grew up in Bergamo. (Though they moved there when she was ~10.)
italy  language_history  anthropology  ethnography  ideology  bergamo  linguistics  via:languagelog  nostalgia  tradition  uses_of_the_past  escaping_the_idiocy_of_rural_life 
august 2009 by cshalizi
ReadMe: Software for Automated Text Analysis
"The ReadMe software package for R takes as input a set of text documents (such as speeches, blog posts, newspaper articles, judicial opinions, movie reviews, etc.), a categorization scheme chosen by the user (e.g., ordered positive to negative sentiment ratings, unordered policy topics, or any other mutually exclusive and exhaustive set of categories), and a small subset of text documents hand classified into the given categories. If used properly, ReadMe will report, normally within sampling error of the truth, the proportion of documents within each of the given categories among those not hand coded. ReadMe computes quantities of interest to the scientific community based on the distribution within categories but does so by skipping the more error prone intermediate step of classifing individual documents. Other procedures are also included to make processing text easy."
to_teach:data-mining  text_mining  content_analysis  R  software  linguistics  statistics  via:chl  king.gary 
june 2009 by cshalizi
An improved statistical test for historical linguistics
"Historical linguistics needs procedures to evaluate the similarity between languages through the comparison of specific word lists drawn from the whole vocabulary. The main issue is to evaluate a fair threshold for the number of similar items beyond which it is sensible to reject the hypothesis of chance similarity. After a short review of papers dealing with that problem, in this paper an extension of those methods is proposed which exploits available data in a more efficient way. In particular, the exact distribution of the new test statistics is calculated and the power of the new procedure is compared with the power of the existing method."
statistics  linguistics  historical_linguistics  to:NB  to_read 
june 2009 by cshalizi
languagehat.com: DAVID FOSTER WALLACE DEMOLISHED.
Later: I agree with a correspondent that "demolished" is too strong, but the key observation is that Wallace completely fails to get _why_ linguists take the position they do, which is not some perverse antinomianism but that they want to understand what people actually do.
enjoyable_rants  linguistics  wallace.david_foster 
april 2009 by cshalizi
Intentional Vagueness (Blume and Board)
"This paper analyzes communication with a language that is vague in the sense that
identical messages do not always result in identical interpretations. It is shown that
strategic agents frequently add to this vagueness by being intentionally vague, i.e. they
deliberately choose less precise messages than they have to among the ones available
to them in equilibrium. Having to communicate with a vague language can be welfare
enhancing because it mitigates conflict. In equilibria that satisfy a dynamic stability
condition intentional vagueness increases with the degree of conflict between sender
and receiver."
linguistics  pragmatics  game_theory  vagueness  blume.andreas  to:NB  to_read 
february 2009 by cshalizi
Language Log: Comparing communication efficiency across languages
"English texts are larger than their Chinese counterparts by a factor of between 1.37 and 2.27 before compression, or 1.19 to 1.41 after compression."
liberman.mark  linguistics  data_compression 
april 2008 by cshalizi
Log -- David Chess, 27 February 2008
"Maybe it wouldn't have been so bad if we hadn't been each other's First Contacts. Virgin civilizations, groping each other in the dark."
science_fiction  linguistics  mind-games  funny:geeky  linguistic_relativity  suicide 
february 2008 by cshalizi
Workshop I: Dynamic Searches and Knowledge Building
IPAM workshop on the mathematics of search and knowledge discovery, with links to slides and/or audio for some talks
information_retrieval  machine_learning  data_mining  linguistics  natural_language_processing  via:klk  semantics_from_syntax 
november 2007 by cshalizi
Mind Hacks: Osama Bin Language Acquistion
"And I tell you, artificial intelligence is a false god that provides correlative and not causal models of language acquisition. The infallible methodologies are the comparative study of world languages and lesion analyses of those who must be treated wit
linguistics  funny:academic  taste:bad 
october 2007 by cshalizi

related tags

abstraction  analogy  anthropology  bad_data_analysis  bad_science_journalism  bergamo  blogged  blume.andreas  books:noted  books:recommended  book_reviews  cities  classifiers  cognition  cognitive_science  collective_cognition  complexity  concepts  content_analysis  corporations  cultural_criticism  cultural_evolution  data_compression  data_mining  debunking  dehaene.stanislas  delanda.manuel  discourse_analysis  distributed_systems  dyslexia  economics  economic_history  emergence  english  enjoyable_rants  entropy  escaping_the_idiocy_of_rural_life  ethnography  evisceration  evolutionary_biology  evolution_of_cognition  experimental_biology  experimental_psychology  finance  fish.stanley  fmri  foxp2  funny:academic  funny:geeky  game_theory  genetics  gestures  globalization  grammar_induction  great_transformation  harrapan_civilization  harris.zellig  historical_linguistics  history_of_science  hobbs.jerry  human_genetics  ideology  imperialism  indo-european  information_retrieval  information_theory  institutions  interpretation  in_NB  irony  italy  japanese  jones.william  kempson.ruth  king.gary  labov.william  lakoff.george  language_acquisition  language_evolution  language_history  liberman.mark  lie_detection  linguistics  linguistic_evolution  linguistic_relativity  linguistic_universals  lives_of_the_scholars  logistic_regression  machine_learning  machine_translation  markov_models  materialism  memes  metaphor  methodological_EPIC_FAIL  mind-games  model_selection  natural_language_processing  networks  neuropsychology  neuroscience  nielsen.michael  nostalgia  nunberg.geoff  perceptron  philosophy  pictish  political_science  pragmatics  primates  progressive_forces  psychoceramics  pullum.geoff  pullum.geoffrey  R  rare_events  re:AoS_project  recursion  relevance  science_fiction  semantics  semantics_from_syntax  semiotics  sex_differences  social_networks  sociolinguistics  software  sperber.dan  statistical_inference_for_stochastic_processes  statistics  strunk_and_white  suicide  taste:bad  text_mining  to:NB  to_read  to_teach:data-mining  track_down_references  tradition  uses_of_the_past  us_politics  vagueness  via:arthegall  via:chl  via:guslacerda  via:henry_farrell  via:klk  via:languagelog  via:slaniel  wallace.david_foster  why_oh_why_cant_we_have_a_better_academic_publishing_system  wilson.deirdre  world_history  writing_advice  zionism 

Copy this bookmark:



description:


tags: