cshalizi + information_retrieval 39
Non-Parametric Modeling of Partially Ranked Data
february 2012 by cshalizi
"Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive computationally efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. A bias-variance analysis and an experimental study demonstrate the applicability of the proposed method."
to:NB
statistics
machine_learning
categorical_data
ordinal_data
information_retrieval
nonparametrics
lebanon.guy
february 2012 by cshalizi
The structure of science information (Harris, 2002)
december 2011 by cshalizi
"The organization of information within science can be investigated in a principled way through analysis of science language. The restricted use of language in science enables description of the informational structure of science and of particular subfields, with strong similarities to structures in mathematics and programming languages. This result rests on decades of research into the relation between form and content in language, based on an information-theoretic approach to the structure of information. Examples are provided from immunology and the social sciences. Practical applications include storage of science information in databases, indexing the literature, and identification and resolution of controversy."
to:NB
linguistics
text_mining
natural_language_processing
harris.zellig
information_retrieval
december 2011 by cshalizi
The Fans Are All Right (Pinboard Blog)
october 2011 by cshalizi
"I learned a lot about fandom couple of years ago in conversations with my friend Britta, who was working at the time as community manager for Delicious. She taught me that fans were among the heaviest users of the bookmarking site, and had constructed an edifice of incredibly elaborate tagging conventions, plugins, and scripts to organize their output along a bewildering number of dimensions. If you wanted to read a 3000 word fic where Picard forces Gandalf into sexual bondage, and it seems unconsensual but secretly both want it, and it's R-explicit but not NC-17 explicit, all you had to do was search along the appropriate combination of tags (and if you couldn't find it, someone would probably write it for you). By 2008 a whole suite of theoretical ideas about folksonomy, crowdsourcing, faceted infomation retrieval, collaborative editing and emergent ontology had been implemented by a bunch of friendly people so that they could read about Kirk drilling Spock." --- See also the very last link.
fandom
social_life_of_the_mind
social_media
information_retrieval
tagging
pinboard
delicious.com
via:arsyed
to_teach:data-mining
ok_maybe_not_really_to_teach
october 2011 by cshalizi
Draw - Google Correlate
october 2011 by cshalizi
So cool: draw a curve free-hand, get the keywords whose time series correlate best with it. I can't go below a correlation of 0.70.
google
information_retrieval
spurious_correlations
to_teach:undergrad-ADA
to_teach:data-mining
to:blog
via:vqv
rademacher_complexity
october 2011 by cshalizi
Bayesian Checking for Topic Models
july 2011 by cshalizi
"Real document collections do not fit the inde- pendence assumptions asserted by most statistical topic models, but how badly do they violate them? We present a Bayesian method for measuring how well a topic model fits a corpus. Our approach is based on posterior predictive checking, a method for diagnosing Bayesian models in user-defined ways. Our method can identify where a topic model fits the data, where it falls short, and in which directions it might be improved."
topic_models
model-checking
blei.david
in_NB
via:ariddell
statistics
machine_learning
information_retrieval
clustering
have_read
july 2011 by cshalizi
Predicting consumer behavior with Web search — PNAS
october 2010 by cshalizi
What search can and cannot predict. They mention, but I think could have stressed even more, that the search data is generated _automatically_ as a by-product of now-ordinary social life, rather than a deliberate construction on the part of public or private data-collecting agencies, so it is very, very, very cheap.
internet
data_mining
to_teach:data-mining
kith_and_kin
watts.duncan
hofman.jake
sociology
information_retrieval
networked_life
have_read
october 2010 by cshalizi
[1010.0499] Statistical analysis of $k$-nearest neighbor collaborative recommendation
october 2010 by cshalizi
"Collaborative recommendation is an information-filtering technique that attempts to present information items that are likely of interest to an Internet user. Traditionally, collaborative systems deal with situations with two types of variables, users and items. In its most common form, the problem is framed as trying to estimate ratings for items that have not yet been consumed by a user. Despite wide-ranging literature, little is known about the statistical properties of recommendation systems. In fact, no clear probabilistic model even exists which would allow us to precisely describe the mathematical forces driving collaborative filtering. ... [We] set out a general sequential stochastic model for collaborative recommendation. ... in-depth analysis of the so-called cosine-type nearest neighbor ,,, method .... asymptotic performance as the number of users grows. We establish consistency ... under mild assumptions... Rates of convergence and examples ..."
collaborative_filtering
information_retrieval
stochastic_models
nearest_neighbors
to_teach:data-mining
october 2010 by cshalizi
ILI 2009 Presentation – "Self-plagiarism is style"
june 2010 by cshalizi
Cool effects achieved by applying basic data mining to libraries. To be used as teaching fodder, but honestly I should also find the time to suggest it to our librarians.
libraries
data_mining
information_retrieval
collaborative_filtering
via:magistra_et_mater
to_teach:data-mining
june 2010 by cshalizi
World Brain: the Idea of a Permanent World Encyclopedia | Beyond The Beyond
march 2010 by cshalizi
H. G. Wells prophesies, well, something like us, in 1937; with commentary by Bruce Sterling. Can't recall if Bush mentioned this.
early_visions_of_network_society
encyclopedias
wells.h.g.
sterling.bruce
information_retrieval
the_present_before_it_was_widely_distributed
to:blog
march 2010 by cshalizi
Beyond DCG: User Behavior as a Predictor of a Successful Search
february 2010 by cshalizi
Yay Kristina! (Not sure I could actually teach this in 350.)
search_engines
markov_models
data_mining
klinkner.kristina
information_retrieval
to_teach:data-mining
kith_and_kin
february 2010 by cshalizi
[0910.2340] A Stochastic Model for Collaborative Recommendation
october 2009 by cshalizi
"Collaborative recommendation is an information-filtering technique that attempts to present ,,, movies, music, books, news, images, Web pages, etc. that are likely of interest to [users]. ... In its most common form, the problem is framed as trying to estimate ratings for items that have not yet been consumed by a user. Despite wide-ranging literature, little is known about the statistical properties of recommendation systems. In fact, no clear probabilistic model even exists allowing us to precisely describe the mathematical forces driving collaborative filtering. To provide an initial contribution to this, we propose to set out a general sequential stochastic model for collaborative recommendation and analyze its asymptotic performance as the number of users grows.... analysis of the so-called cosine-type nearest neighbor collaborative method .... consistency of the procedure under mild assumptions on the model. Rates of convergence and examples..."
collaborative_filtering
information_retrieval
data_mining
to_read
to:NB
to_teach:data-mining
october 2009 by cshalizi
Firefox rejects your « Lolcats ‘n’ Funny Pictures of Cats – I Can Has Cheezburger?
september 2009 by cshalizi
Need to work this in as an easter-egg in the code.
lolcats
lolfoxes
information_retrieval
to_teach:data-mining
september 2009 by cshalizi
UC Berkeley Enron Email Analysis
august 2009 by cshalizi
With hand-labeled categories.
enron
email
text_mining
information_retrieval
fraud
to_teach:data-mining
august 2009 by cshalizi
LDC Catalog: New York Times Annotated Corpus
august 2009 by cshalizi
Sounds like it would be perfect for 350. Now how the **** do I get access?
information_retrieval
text_mining
newspapers
data_sets
to_teach:data-mining
via:myl
august 2009 by cshalizi
Geeking with Greg: Finding task boundaries in search logs
august 2009 by cshalizi
Nice write-up from a year ago on K's paper.
information_retrieval
classifiers
search_engines
klinkner.kristina
jones.rosie
kith_and_kin
august 2009 by cshalizi
Combining Systems and Databases: A Search Engine Retrospective
may 2009 by cshalizi
I won't actually teach this in 350, but I should probably mention it.
databases
information_retrieval
to_teach:data-mining
via:arthegall
may 2009 by cshalizi
Ton's Interdependent Thoughts: WolframAlpha, Getting Less Impressed Upon Closer Look
may 2009 by cshalizi
Nice: "For all its coolness on the front of WolframAlpha, on the back end this sounds like it's the mechanical turk of the semantic web."`
information_retrieval
wolfram.stephen
wolfram_alpha
via:arthegall
may 2009 by cshalizi
This is relevant to my interests « I Can Has Cheezburger?
april 2009 by cshalizi
To illustrate search assessment and Rocchio's algorithm.
lolcats
funny:geeky
to_teach:data-mining
information_retrieval
april 2009 by cshalizi
About XStructure
july 2008 by cshalizi
Interface to arxiv via some kind of hierarchical clustering of the citation graph. (Can't find details.) Interesting but doesn't look all that useful (yet).
community_discovery
hierarchical_structure
information_retrieval
arxiv
july 2008 by cshalizi
The Library in the New Age - The New York Review of Books
may 2008 by cshalizi
Some good points, but surprisingly bad history (Chinese printing didn't take off, "The Web began as a means of communication among physicists in 1981"!) from a professional historian. Not material to the mostly-sound recommendations.
books
research
libraries
internet
google
information_retrieval
darnton.robert
via:idlethink
academia
history_of_intellect
bibliography
journalism
newspapers
enlightenment
computer_networks_as_provinces_of_the_commonwealth_of_letters
blogs
why_oh_why_cant_we_have_a_better_press_corps
why_oh_why_cant_we_have_a_better_academic_publishing_system
natural_history_of_truthiness
social_life_of_the_mind
may 2008 by cshalizi
Desperately seeking the consumer: Personalized search engines and the commercial exploitation of user data: Rohle
march 2008 by cshalizi
" Essentially, search engines now fulfill the task of translating information needs into consumption needs."
information_retrieval
march 2008 by cshalizi
Workshop I: Dynamic Searches and Knowledge Building
november 2007 by cshalizi
IPAM workshop on the mathematics of search and knowledge discovery, with links to slides and/or audio for some talks
information_retrieval
machine_learning
data_mining
linguistics
natural_language_processing
via:klk
semantics_from_syntax
november 2007 by cshalizi
[0710.3972] Entropy Rank and Free Energy: a thermodynamic formalism for Web search
november 2007 by cshalizi
"variants of PageRank ... based on Ruelle's thermodynamic formalism"
to:NB
information_retrieval
page_rank
thermodynamic_formalism
november 2007 by cshalizi
Language Log: Solving the mysteries of the ages via semantic search
november 2007 by cshalizi
Some of these are almost in the "good vodka, rotten meat" league...
information_retrieval
funny:geeky
november 2007 by cshalizi
Michael Nielsen » Information Aggregators
november 2007 by cshalizi
"Where are the programming languages that have Bayesian filters, PageRank, and other types of collective intelligence as a central, core part of the language? I don’t mean libraries or plugines, I mean integrated into the core of the language in the sam
collaborative_filtering
information_retrieval
social_media
the_web
cognitive_triage
november 2007 by cshalizi
PigeonRank
october 2007 by cshalizi
building upon the breakthrough research of B. F. Skinner
google
search_engines
behaviorism
reinforcement_learning
funny:geeky
information_retrieval
to_teach:data-mining
skinner.b.f.
october 2007 by cshalizi
related tags
academia ⊕ algorithms ⊕ arthegall ⊕ arxiv ⊕ behaviorism ⊕ bibliography ⊕ bioinformatics ⊕ blei.david ⊕ blogs ⊕ books ⊕ categorical_data ⊕ citation_networks ⊕ classifiers ⊕ clustering ⊕ cognitive_triage ⊕ collaborative_filtering ⊕ community_discovery ⊕ computational_statistics ⊕ computer_networks_as_provinces_of_the_commonwealth_of_letters ⊕ darnton.robert ⊕ databases ⊕ data_analysis ⊕ data_mining ⊕ data_sets ⊕ delicious.com ⊕ distributed_systems ⊕ document_summarization ⊕ early_visions_of_network_society ⊕ email ⊕ encyclopedias ⊕ enlightenment ⊕ enron ⊕ fandom ⊕ fraud ⊕ funny:geeky ⊕ good_old_fashioned_ai ⊕ google ⊕ graphical_models ⊕ harris.zellig ⊕ have_read ⊕ hierarchical_structure ⊕ history_of_intellect ⊕ hofman.jake ⊕ hofmann.thomas ⊕ hypothesis_testing ⊕ image_retrieval ⊕ information_retrieval ⊖ internet ⊕ in_NB ⊕ jones.rosie ⊕ journalism ⊕ kith_and_kin ⊕ klinkner.kristina ⊕ latent_semantic_analysis ⊕ lebanon.guy ⊕ lenat.douglas ⊕ libraries ⊕ linguistics ⊕ lolcats ⊕ lolfoxes ⊕ machine_learning ⊕ markov_models ⊕ meaning_as_location_in_a_system_of_relations ⊕ model-checking ⊕ natural_history_of_truthiness ⊕ natural_language_processing ⊕ nearest_neighbors ⊕ networked_life ⊕ networks ⊕ newspapers ⊕ nonparametrics ⊕ ok_maybe_not_really_to_teach ⊕ ordinal_data ⊕ page_rank ⊕ pattern_discovery ⊕ pinboard ⊕ precision-recall ⊕ rademacher_complexity ⊕ radev.dragomir ⊕ reinforcement_learning ⊕ research ⊕ scientific_computing ⊕ search_engines ⊕ semantics_from_syntax ⊕ skinner.b.f. ⊕ social_life_of_the_mind ⊕ social_media ⊕ sociology ⊕ spurious_correlations ⊕ statistics ⊕ sterling.bruce ⊕ stochastic_models ⊕ tagging ⊕ text_mining ⊕ theoretical_computer_science ⊕ thermodynamic_formalism ⊕ the_mechanical_turk_of_the_semantic_web ⊕ the_present_before_it_was_widely_distributed ⊕ the_web ⊕ to:blog ⊕ to:NB ⊕ topic_models ⊕ to_read ⊕ to_teach:data-mining ⊕ to_teach:undergrad-ADA ⊕ via:ariddell ⊕ via:arsyed ⊕ via:arthegall ⊕ via:chl ⊕ via:idlethink ⊕ via:klk ⊕ via:magistra_et_mater ⊕ via:myl ⊕ via:vqv ⊕ watts.duncan ⊕ wells.h.g. ⊕ why_oh_why_cant_we_have_a_better_academic_publishing_system ⊕ why_oh_why_cant_we_have_a_better_press_corps ⊕ wolfram.stephen ⊕ wolfram_alpha ⊕Copy this bookmark: