cshalizi + tagging   9

Game-powered machine learning
"Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data."

--- This is more than a bit of a stunt, but it points in an interesting direction.
to:NB  to_read  data_mining  collective_cognition  active_learning  tagging  classifiers  re:democratic_cognition 
4 weeks ago by cshalizi
PeteSearch: Keep the web weird
"I'm doing a short talk at SXSW tomorrow, as part of a panel on Creating the Internet of Entities. Preparing is tough because don't I believe it's possible, and even if it was I wouldn't like it. Opposing better semantic tagging feels like hating on Girl Scout cookies, but I've realized that I like an internet full of messy, redundant, ambiguous data.
"The stated goal of an Internet of Entities is a web where "real-world people, places, and things can be referenced unambiguously". We already have that. Most pages give enough context and attributes for a person to figure out which real world entity it's talking about. What the definition is trying to get at is a reference that a machine can understand.
"The implicit goal of this and similar initiatives like Stephen Wolfram's .data proposal is to make a web that's more computable. Right now, the pages that make up the web are a soup of human-readable text, a long way from the structured numbers and canonical identifiers that programs need to calculate with. I often feel frustrated as I try to divine answers from chaotic, unstructured text, but I've also learned to appreciate the advantages of the current state of things."
to:blog  warden.peter  web  internet  semantic_web  tagging  networked_life 
10 weeks ago by cshalizi
[1110.4851] Leveraging User Diversity to Harvest Knowledge on the Social Web
"Social web users are a very diverse group with varying interests, levels of expertise, enthusiasm, and expressiveness. As a result, the quality of content and annotations they create to organize content is also highly variable. While several approaches have been proposed to mine social annotations, for example, to learn folksonomies that reflect how people relate narrower concepts to broader ones, these methods treat all users and the annotations they create uniformly. We propose a framework to automatically identify experts, i.e., knowledgeable users who create high quality annotations, and use their knowledge to guide folksonomy learning. We evaluate the approach on a large body of social annotations extracted from the photosharing site Flickr. We show that using expert knowledge leads to more detailed and accurate folksonomies. Moreover, we show that including annotations from non-expert, or novice, users leads to more comprehensive folksonomies than experts' knowledge alone."
to:NB  data_mining  social_life_of_the_mind  social_media  kith_and_kin  lerman.kristina  tagging 
october 2011 by cshalizi
The Fans Are All Right (Pinboard Blog)
"I learned a lot about fandom couple of years ago in conversations with my friend Britta, who was working at the time as community manager for Delicious. She taught me that fans were among the heaviest users of the bookmarking site, and had constructed an edifice of incredibly elaborate tagging conventions, plugins, and scripts to organize their output along a bewildering number of dimensions. If you wanted to read a 3000 word fic where Picard forces Gandalf into sexual bondage, and it seems unconsensual but secretly both want it, and it's R-explicit but not NC-17 explicit, all you had to do was search along the appropriate combination of tags (and if you couldn't find it, someone would probably write it for you). By 2008 a whole suite of theoretical ideas about folksonomy, crowdsourcing, faceted infomation retrieval, collaborative editing and emergent ontology had been implemented by a bunch of friendly people so that they could read about Kirk drilling Spock." --- See also the very last link.
fandom  social_life_of_the_mind  social_media  information_retrieval  tagging  pinboard  delicious.com  via:arsyed  to_teach:data-mining  ok_maybe_not_really_to_teach 
october 2011 by cshalizi
[1011.3557] A Probabilistic Approach for Learning Folksonomies from Structured Data
"Learning structured representations has emerged as an important problem in many domains ... One approach to learning complex structures is to integrate many smaller, incomplete and noisy structure fragments. ... we present an unsupervised probabilistic approach that extends affinity propagation to combine the small ontological fragments into a collection of integrated, consistent, and larger folksonomies. ... the method must aggregate similar structures while avoiding structural inconsistencies and handling noise... validate ... on a real-world social media dataset ,,, of shallow personal hierarchies specified by many individual users, collected from the photosharing website Flickr. ... our proposed approach is able to construct deeper and denser structures, compared to ... the standard affinity propagation algorithm. Additionally, [it] yields better overall integration quality than a state-of-the-art approach based on incremental relational clustering."
tagging  change_of_representation  ontologies  relational_learning  lerman.kristina  getoor.lise  in_NB 
november 2010 by cshalizi
Thing 8 of 23: the tags don’t work - Magistra et Mater
Why _don't_ libraries have a "people who checked out this book also checked out..." feature? (Also, I suspect other peoples' tags become most useful when employed as features in a recommendation system, rather than directly searching on them, but that's a mere guess.)
tagging  social_media  collaborative_filtering  magistra 
june 2010 by cshalizi
[1003.2281] Folks in Folksonomies: Social Link Prediction from Shared Metadata
" focus on Flickr and Last.fm, two social media systems in which we can relate the tagging activity of the users with an explicit representation of their social network. We show that a substantial level of local lexical and topical alignment is observable among users who lie close to each other in the social network. We introduce a null model that preserves user activity while removing local correlations, allowing us to disentangle the actual local alignment between users from statistical effects due to the assortative mixing of user activity and centrality in the social network. ... suggests that users with similar topical interests are more likely to be friends, and therefore semantic similarity measures among users based solely on their annotation metadata should be predictive of social links. We test this ... on the Last.fm data set ... social network constructed from semantic similarity captures actual friendship [better] than Last.fm's suggestions based on listening patterns"
link_prediction  network_data_analysis  tagging  social_networks  social_life_of_the_mind  re:homophily_and_confounding  to_read  social_media 
march 2010 by cshalizi

Copy this bookmark:



description:


tags: