Game-powered machine learning
4 weeks ago by cshalizi
"Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data."
--- This is more than a bit of a stunt, but it points in an interesting direction.
to:NB
to_read
data_mining
collective_cognition
active_learning
tagging
classifiers
re:democratic_cognition
--- This is more than a bit of a stunt, but it points in an interesting direction.
4 weeks ago by cshalizi
PeteSearch: Keep the web weird
10 weeks ago by cshalizi
"I'm doing a short talk at SXSW tomorrow, as part of a panel on Creating the Internet of Entities. Preparing is tough because don't I believe it's possible, and even if it was I wouldn't like it. Opposing better semantic tagging feels like hating on Girl Scout cookies, but I've realized that I like an internet full of messy, redundant, ambiguous data.
"The stated goal of an Internet of Entities is a web where "real-world people, places, and things can be referenced unambiguously". We already have that. Most pages give enough context and attributes for a person to figure out which real world entity it's talking about. What the definition is trying to get at is a reference that a machine can understand.
"The implicit goal of this and similar initiatives like Stephen Wolfram's .data proposal is to make a web that's more computable. Right now, the pages that make up the web are a soup of human-readable text, a long way from the structured numbers and canonical identifiers that programs need to calculate with. I often feel frustrated as I try to divine answers from chaotic, unstructured text, but I've also learned to appreciate the advantages of the current state of things."
to:blog
warden.peter
web
internet
semantic_web
tagging
networked_life
"The stated goal of an Internet of Entities is a web where "real-world people, places, and things can be referenced unambiguously". We already have that. Most pages give enough context and attributes for a person to figure out which real world entity it's talking about. What the definition is trying to get at is a reference that a machine can understand.
"The implicit goal of this and similar initiatives like Stephen Wolfram's .data proposal is to make a web that's more computable. Right now, the pages that make up the web are a soup of human-readable text, a long way from the structured numbers and canonical identifiers that programs need to calculate with. I often feel frustrated as I try to divine answers from chaotic, unstructured text, but I've also learned to appreciate the advantages of the current state of things."
10 weeks ago by cshalizi
[1110.4851] Leveraging User Diversity to Harvest Knowledge on the Social Web
october 2011 by cshalizi
"Social web users are a very diverse group with varying interests, levels of expertise, enthusiasm, and expressiveness. As a result, the quality of content and annotations they create to organize content is also highly variable. While several approaches have been proposed to mine social annotations, for example, to learn folksonomies that reflect how people relate narrower concepts to broader ones, these methods treat all users and the annotations they create uniformly. We propose a framework to automatically identify experts, i.e., knowledgeable users who create high quality annotations, and use their knowledge to guide folksonomy learning. We evaluate the approach on a large body of social annotations extracted from the photosharing site Flickr. We show that using expert knowledge leads to more detailed and accurate folksonomies. Moreover, we show that including annotations from non-expert, or novice, users leads to more comprehensive folksonomies than experts' knowledge alone."
to:NB
data_mining
social_life_of_the_mind
social_media
kith_and_kin
lerman.kristina
tagging
october 2011 by cshalizi
The Fans Are All Right (Pinboard Blog)
october 2011 by cshalizi
"I learned a lot about fandom couple of years ago in conversations with my friend Britta, who was working at the time as community manager for Delicious. She taught me that fans were among the heaviest users of the bookmarking site, and had constructed an edifice of incredibly elaborate tagging conventions, plugins, and scripts to organize their output along a bewildering number of dimensions. If you wanted to read a 3000 word fic where Picard forces Gandalf into sexual bondage, and it seems unconsensual but secretly both want it, and it's R-explicit but not NC-17 explicit, all you had to do was search along the appropriate combination of tags (and if you couldn't find it, someone would probably write it for you). By 2008 a whole suite of theoretical ideas about folksonomy, crowdsourcing, faceted infomation retrieval, collaborative editing and emergent ontology had been implemented by a bunch of friendly people so that they could read about Kirk drilling Spock." --- See also the very last link.
fandom
social_life_of_the_mind
social_media
information_retrieval
tagging
pinboard
delicious.com
via:arsyed
to_teach:data-mining
ok_maybe_not_really_to_teach
october 2011 by cshalizi
[1011.3557] A Probabilistic Approach for Learning Folksonomies from Structured Data
november 2010 by cshalizi
"Learning structured representations has emerged as an important problem in many domains ... One approach to learning complex structures is to integrate many smaller, incomplete and noisy structure fragments. ... we present an unsupervised probabilistic approach that extends affinity propagation to combine the small ontological fragments into a collection of integrated, consistent, and larger folksonomies. ... the method must aggregate similar structures while avoiding structural inconsistencies and handling noise... validate ... on a real-world social media dataset ,,, of shallow personal hierarchies specified by many individual users, collected from the photosharing website Flickr. ... our proposed approach is able to construct deeper and denser structures, compared to ... the standard affinity propagation algorithm. Additionally, [it] yields better overall integration quality than a state-of-the-art approach based on incremental relational clustering."
tagging
change_of_representation
ontologies
relational_learning
lerman.kristina
getoor.lise
in_NB
november 2010 by cshalizi
Thing 8 of 23: the tags don’t work - Magistra et Mater
june 2010 by cshalizi
Why _don't_ libraries have a "people who checked out this book also checked out..." feature? (Also, I suspect other peoples' tags become most useful when employed as features in a recommendation system, rather than directly searching on them, but that's a mere guess.)
tagging
social_media
collaborative_filtering
magistra
june 2010 by cshalizi
[1003.2281] Folks in Folksonomies: Social Link Prediction from Shared Metadata
march 2010 by cshalizi
" focus on Flickr and Last.fm, two social media systems in which we can relate the tagging activity of the users with an explicit representation of their social network. We show that a substantial level of local lexical and topical alignment is observable among users who lie close to each other in the social network. We introduce a null model that preserves user activity while removing local correlations, allowing us to disentangle the actual local alignment between users from statistical effects due to the assortative mixing of user activity and centrality in the social network. ... suggests that users with similar topical interests are more likely to be friends, and therefore semantic similarity measures among users based solely on their annotation metadata should be predictive of social links. We test this ... on the Last.fm data set ... social network constructed from semantic similarity captures actual friendship [better] than Last.fm's suggestions based on listening patterns"
link_prediction
network_data_analysis
tagging
social_networks
social_life_of_the_mind
re:homophily_and_confounding
to_read
social_media
march 2010 by cshalizi
related tags
active_learning ⊕ change_of_representation ⊕ classifiers ⊕ collaborative_filtering ⊕ collective_cognition ⊕ data_mining ⊕ delicious.com ⊕ fandom ⊕ flickr ⊕ getoor.lise ⊕ hypergraphs ⊕ information_retrieval ⊕ internet ⊕ in_NB ⊕ kith_and_kin ⊕ lerman.kristina ⊕ link_prediction ⊕ magistra ⊕ networked_life ⊕ network_data_analysis ⊕ newman.mark ⊕ ok_maybe_not_really_to_teach ⊕ ontologies ⊕ pinboard ⊕ re:democratic_cognition ⊕ re:homophily_and_confounding ⊕ relational_learning ⊕ semantic_web ⊕ social_life_of_the_mind ⊕ social_media ⊕ social_networks ⊕ tagging ⊖ to:blog ⊕ to:NB ⊕ to_read ⊕ to_teach:data-mining ⊕ via:arsyed ⊕ warden.peter ⊕ web ⊕Copy this bookmark: