cshalizi + pullum.geoff 1
Language Log » Keith Chen, Whorfian economist
february 2012 by cshalizi
"I also worry that it is too easy to find correlations of this kind, and we don't have any idea just how easy until a concerted effort has been made to show that the spurious ones are not supportable. For example, if we took "has (vs. does not have) pharyngeal consonants", or "uses (vs. does not use) close front rounded vowels", would we find correlations there too? I have some colleagues here at the University of Edinburgh, within Simon Kirby's research group, who have run some informal experiments on the data Chen uses to see if dredging up spurious correlations of this kind is easy or hard, and so far they have found it jaw-droppingly easy. (I won't say any more, because I am in the weird position of producing unrefereed telegraphing of unrefereed and informal objections to an unrefereed and unpublished working paper, and it's all getting a bit too weird for me.)"
How many languages are there in Europe? Order of 10^2. How many variables can an economist get cross-country data on? Again, order of 10^2. How many discriminable syntactic features do languages have? Easily order of 10^3 if not come. Conclusion: this is not what I mean when I say that economists should do more data-mining.
economics
bad_data_analysis
linguistics
pullum.geoff
How many languages are there in Europe? Order of 10^2. How many variables can an economist get cross-country data on? Again, order of 10^2. How many discriminable syntactic features do languages have? Easily order of 10^3 if not come. Conclusion: this is not what I mean when I say that economists should do more data-mining.
february 2012 by cshalizi