jschneider + datamining   121

Smart Content Re-viewed: Text Analytics and Semantic Content Enrichment
"There are other solution providers in the content analytics meets semantic annotation/enrichment game. In addition to IBM and Ontotext, they include HP Autonomy, MarkLogic, OpenText, Temis, and the nascent, open-source IKS project. Other vendors offer enterprise-strength building blocks, for instance, SAS via the various SAS Text Analytics components."
text-analytics  NLP  datamining  visualization  content-analytics  content-enrichment  semantic-content-enrichment  linkeddata  ontologies 
february 2012 by jschneider
The Microsoft Update: Why I was banned on Google+ (and how I redeemed myself)
" But three editors here have already had creepy or frustrating situations with Google+, and other ways that Google is matching our names to our public data. It's making me wonder if Google can be trusted at all, let alone trusted more than Facebook.""Google saw messages from that Twitter account coming into his Gmail, correlated the two and started serving up unasked for Tweets. Yes, Google is correlating your Google profile with data from public social networks. You must opt out if you want it to cut it out. See screen shot below for details. ""After trying everything I could think of, I thought Plus was either ridiculously hard to use or just plain broken (when in truth, the answer was neither, as my account had been suspended).

A few days later, when it still wasn't fixed, I tried to update my profile and when I hit save, I was finally told what the problem was. It didn't like my name. I was told the account was being investigated for possible violations for Google's profile policies. So, being told what the problem was at last, I entered my full real name and I was told to check back. So I did, day after day last week.

I finally sent Google a feedback form, showing it my name and telling it that I was not violating its policies. That did the trick.""Granted, Google+ is still in beta. But Google, if you are going to ban people for not using their real names, do your fair part and make this privacy maze easier to use."
google  googleplus  privacy  gmail  publicity  geolocation  facebook  datamining 
july 2011 by jschneider
Blog para proyectantes de Dani Gayo - Les Liaisons dangereuses
"Detection of religious and political beliefs, sexual orientation, and race/ethnicity achieved above 95% precision. Sex and age achieved poorer results but, still, they were much more precise than a random classifier.""The implications are IMHO important (and a little scary), it means that simple algorithms can be used to label people, the most promising assignments can be manually checked and, hence, used to bootstrap the next iteration. Besides, we the users are doing most of the work by telling about ourselves; we are providing the labels, happily, for free, for anyone.

Maybe you are aware of this and don't care discussing your beliefs and personal choices, fine. But by doing that, those of your friends and acquaintances who conceal such an information are at risk.""What's the morale of this? The old saying "You are known by the company you keep" is absolutely true, so don't tell anybody who your friends are.
" see also http://arxiv.org/abs/1012.5913
demographics  homophily  privacy  datamining 
july 2011 by jschneider
The Humanities Go Google - Technology - The Chronicle of Higher Education
"You need a team. To sort, interrogate, and interpret roughly 1,000 digital texts, scholars have brought together a data-mining gang drawn from the departments of English, history, and computer science. They're the rare clique of humanities graduate students who work across disciplines and discuss programming languages over beer, an unlikely mix of "techies" and "fuzzies" with enough characters for a reality-TV show."
digitalhumanities  datamining  distant-reading 
may 2010 by jschneider
Google Buzz as Experience Pattern « Fred Stutzman
via http://twitter.com/hartzog/status/8976226216 "To participate in Buzz, one must agree to have a Google profile, a place “visible on the web so friends can find and recognize you.” Notably, the Google profile is at the center of Google’s social search efforts.""For Google, it is more important that you use Buzz once than if you use it on an ongoing basis.

What are the implications of a system like Buzz? It is a pretty interesting case of what might be thought of as data leveraging.""You can’t delete your Google Buzz account. If you create a Google Buzz account and wish to delete it, you have to delete your entire Google profile (killing your search listing, etc. at the same time)."
buzz  via:@hartzog  datamining  privacy 
february 2010 by jschneider
Shownar - Projects - BERG
"What Shownar presents, on its front page, is a measure of surprisingness: if a show broadcast in a late night slot on a niche channel suddenly has more buzz that the model would predict from those facts, it ranks high for surprisingness. But a daily soap opera that typically attracts millions of viewers is popular, but not surprisingly so."
datamining  BBC  attention  from delicious
december 2009 by jschneider
MOA - Massive Online Analysis
"Massive On-line Analysis is an environment for massive data mining.

MOA is a framework for data stream mining. Includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, also written in Java, while scaling to more demanding problems. "
datamining  java  classification 
november 2009 by jschneider
How Not To Read A Million Books
"I think that what Moretti calls “the quantitative approach to literature” acquires a special importance when millions of books are equally at your fingertips, all eagerly responding to your Google Book Search: you can no longer as easily ignore the books you don't know, nor can you grasp the collective systems they make up without some new strategy—a strategy for not reading."
text-mining  datamining  MONK  million-books  strategic-reading 
september 2009 by jschneider
"Anonymized" data really isn't—and here's why not - Ars Technica
""the surprising failure of anonymization." As increasing amounts of information on all of us are collected and disseminated online, scrubbing data just isn't enough to keep our individual "databases of ruin" out of the hands of the police, political enemies, nosy neighbors, friends, and spies.

If that doesn't sound scary, just think about your own secrets, large and small—those films you watched, those items you searched for, those pills you took, those forum posts you made. The power of reidentifiation brings them closer to public exposure every day. So, in a world where the PII concept is dying, how should we start thinking about data privacy and security?""a central reality of data collection: "data can either be useful or perfectly anonymous but never both.""
anonymization  privacy  datamining  secrets 
september 2009 by jschneider
The Architecture Issue - Data Center Overload - NYTimes.com
"We have an almost inimical incuriosity when it comes to infrastructure. It tends to feature in our thoughts only when it’s not working.""Much of the daily material of our lives is now dematerialized and outsourced to a far-flung, unseen network. The stack of letters becomes the e-mail database on the computer, which gives way to Hotmail or Gmail. The clipping sent to a friend becomes the attached PDF file, which becomes a set of shared bookmarks, hosted offsite. The photos in a box are replaced by JPEGs on a hard drive, then a hosted sharing service like Snapfish. The tilting CD tower gives way to the MP3-laden hard drive which itself yields to a service like Pandora, music that is always “there,” waiting to be heard. But where is “there,” and what does it look like?"“The first rule of data centers is: Don’t talk about data centers.” "proximity of the financial firms’ machines to the machines of the trading exchanges in NJ2"
data  storage  infrastructure  nytimes  NJ2  energy  proximity  latency  datamining  recommedned 
june 2009 by jschneider
RT this: OUP Dictionary Team monitors Twitterer’s tweets : OUPblog
"“Watching”, “trying”, “listening”, “reading” and “eating” are all in the top 100 first words, revealing just how often people use Twitter to report on whatever they are experiencing (or consuming) at the time. Evidence of greater informality than general English: “ok” is much more common, and so is “f***”."
twitter  datamining 
june 2009 by jschneider
Cornell team maps out 35m Flickr photos | Media | guardian.co.uk
"The process developed by the team did not rely on geo-tagged photos, but used various clues to interpret location from metadata and the images themselves. The project was part funded by Google, Yahoo and the MacArthur Foundation. "We developed classification methods for characterizing these locations from visual, textual and temporal features," explained Daniel Huttenlocher, professor of computing, information science and business. These methods reveal that both visual and temporal features improve the ability to estimate the location of a photo compared to using just textual tags.""
35mm  flickr  datamining 
april 2009 by jschneider
Dear Gretchen: a Book Analyzing Childhood Letters - information aesthetics
"The wonderful book Dear Gretchen [gretchenetc.com] investigates all the letters that the author, Gretchen Haas, has kept inside a luggage case since her childhood. The design process of the book included finding the word and phrase frequency of the letters, categorizing them by sender, by date, and finally writing personal reflections about each of the senders. Many beautiful graphs were constructed to reveal the word frequency and type (e.g. swear words, abbreviations, colors, names, slang, nicknames, holidays, etc.), and each of the 187 letters were thoroughly documented inside of the book. ...Friends, Boyfriends, Family, and The Unsent...memoir... Most graphs inside the book were completely developed out of paper, as paper stands for an universal symbol of childhood, and most of the letters were received in elementary and middle school. The forms of the graphs range from very traditional to non-sensical, representing a very real and fantastical memory of childhood."
books  odd  infographics  datamining  graphs  paper  memory 
april 2009 by jschneider
Stefano’s Linotype » Unreasonable Hypocrisy
On the use of structured data. “The Unreasonable Effectiveness of Data” "The paper left me with a bitter taste but I couldn’t put my finger on why until this morning. ...Google built its empire on the <a> tag. Not on statistical methods but on fully deterministic topological analysis of the graph of hyperlinks. They did so while everybody else in the field tried all they could to emerge rank out of better understanding of the content of pages using statistical methods and while everybody else thought that the search engine field was a done deal...""A good title should have been “The Surprising Payoffs of Small Distributed Increases in Data Structure” and it would outline how the introduction of UTF-8 massively simplified n-gram analysis or how the introduction of the <a> tag in HTML allowed the creation of pagerank ..."There is no such thing as a clear-cut distinction between data and metadata and there is no such clear-cut distinction between structured and un-structured data."
semanticweb  datamining  pagerank  google  recommended  metadata 
april 2009 by jschneider
Colleges Mine Data to Predict Dropouts - Chronicle.com
"Students who have some combination of poor preparation and slack engagement with the Web site will see the red or yellow light on the course-management system and will also get a warning by e-mail asking them to meet with an instructor or seek outside help.""Purdue researchers found that students in the moderate-risk (yellow light) group who received the e-mail messages did better in the course than did their counterparts in a control group. Most of the students identified as being at highest risk (red light) still did not rectify their situations or take advantage of campus resources, however."
datamining  retnetion  Chronicle  privacy  surveillance 
april 2009 by jschneider
Archiving Writers' Work in the Age of E-Mail - Chronicle.com
"But in late February, several weeks after the iconic writer died, some boxes arrived with unexpected contents: approximately 50 three-and-a-halfand five-and-a-quarter-inch floppy disks — artifacts from late in the author's career when he, like many of his peers, began using a word processor.

The floppies have presented a bit of a problem. While relatively modern to Mr. Updike — who rose to prominence back when publishers were still using Linotype machines — the disks are outmoded and damage-prone by today's standards. Ms. Morris, who curates modern books and manuscripts, has carefully stored them alongside his papers in a temperature-controlled room in the library "until we have a procedure here at Harvard on how to handle these materials.""
digital  archives  prepservation  Chronicle  datamining 
april 2009 by jschneider
Status.net Could Point to the Future of Business Intelligence - ReadWriteWeb
"In private networks, a company will be able to receive automatic notification when one of its employees has begun conversing with another particular employee more than they had before. Perhaps they'll consider putting them in the same work group.

If one sales person doesn't converse with the technical team as often as other sales people do, a company might wonder whether that salesperson is less comfortable explaining technical matters to customers. It will be trivial to determine which technical staff are friendliest and most appropriate to introduce a sales person to, because those kinds of connections will be fully graphable.

In public business networks...the contours of that community will be easier than ever to understand.""Is this creepy? It doesn't have to be...exciting potential here and if an increasingly open technology world can help the business world understand the value of open over control...then this kind of analysis could be democratized and used for good."
surveillance  business-intelligence  enterprise2.0  datamining  status.net  evaluation  privacy 
april 2009 by jschneider
Lorcan Dempsey's weblog
"I was reminded of a note I did a while ago about personal knowledge of collections and readers, and of some parallels with modern recommender systems. It seems to me that one of the major challenges libraries have over the next while relates to how they manifest expertise in their online services and presence, how the system emulates the "good neighbour" or knowledgeable library colleague."
Lorcan  Dempsey  "secret-lives-of-books"  datamining  libraries 
march 2009 by jschneider
KDnuggets: Data Mining Community's Top Resource Since 1997
"Eric Zaetsch points out KDNuggets which is a well-developed mailing list/news site with a KDD flavor. This might particularly interest people looking for industrial jobs in machine learning, as the mailing list has many such."
datamining  kdd 
march 2009 by jschneider
Book Scraper - Times Labs - Book Scraper
" Welcome to Book Scraper, a tool The Times has created to let you explore some of the world's most famous books. We have created a database of 126 classic publications by 53 authors. They contain 12,817,682 words in total, and have a combined vocabulary of 105,836 words. Book Scraper lets you explore them in different ways. You can search by author and learn, for instance, that Shakespeare's written vocabulary was in the order of 24,000 words. You can search by publication, and discover that the longest word in Jules Verne's 20,000 Leagues Under the Sea is pectinibranchidae, which is 17 characters long. (It's a type of mollusc.) Or you can type in a word, and Book Scraper will chart its use across time. (The word thunderer has been used in 6 books in our database, the first mention being in Don Quixote - some 200 years before it became the Times' nickname.) "
books  code4lib  scraping  visualization  UK  datamining  etymology 
february 2009 by jschneider
Web 2.0-style resource discovery comes to libraries – the TILE Project
"Joy Palmer reminded us of the challenges of semantic context and “ontological drift” when user-generated commentary on contentious subjects becomes too rich to be easily assimilated – for example, consider the multiple sparring entries relating to the state of Israel on Wikipedia. She questioned whether the library OPAC (Online Public Access Catalogue) was too generic a system to support contextually and academically meaningful personalisation, and this point was carried over to the break-out discussions about whether users would be motivated to contribute content to institutional OPACs. "
web2.0  datamining  libraries 
february 2009 by jschneider
Nodalities » Blog Archive » What would you collate?
"We’ve been talking a lot about the prevalence of data, and how interacting with it empowers people.""When you think about all the times we use connected software, it makes you wonder why on earth we have to keep doing this again and again. Alongside the obvious data, like contacts, calendar events, and personal settings; there is a world of nearly-immediately useful stuff. When and where I heard that song might not be life-changeing, but it certainly helps with earworms, right? So, was I listening to Last.FM, Blip.FM, Spotify, iPlayer, or—heaven forbid—the radio? Where was this photo taken? What happened in March to make my heating bill so high? When’s my next car service needed, and why did these particular tyres seem to wear out so badly? These are questions I’ve asked myself within 10hours of writing this. A level further, is a host of semi-useful data just waiting to be connected and used. This guy collated everything into charts. Another has his house twitter whenever anything si
data  datamining  self-surveillance 
december 2008 by jschneider
Games Without Frontiers: How Videogames Blind Us With Science
"Videogames are becoming the new hotbed of scientific thinking for kids today.""More than half the gamers used "systems-based reasoning" -- analyzing the game as a complex, dynamic system. And one-tenth actually constructed specific models to explain the behavior of a monster or situation; they would often use their model to generate predictions. Meanwhile, one-quarter of the commentors would build on someone else's previous argument, and another quarter would issue rebuttals of previous arguments and models" "At one point, Steinkuehler met up with one of the kids who'd built the Excel model to crack the boss. "Do you realize that what you're doing is the essence of science?" she asked. He smiled at her. "Dude, I'm not doing science," he replied. "I'm just cheating the game!" "
videogames  science  scientific-method  Excel  analysis  learning  Wired  datamining  mathematical-models 
september 2008 by jschneider
Bytes of Life - washingtonpost.com
"It's not about tracking what you do, they say. It's about learning who you are. "a new group in San Francisco called Quantified Self. Members plan to meet monthly to share with one another the tools and sites they've found helpful on their individual paths to self-digitization. Topics include, according to the group invite: behavior monitoring, location tracking, digitizing body info and non-invasive probes. ""ifeblogging seems mostly like a byproduct of an always-on society. If you do something but fail to record it online, did it really happen? Self-tracking, on the other hand, is partly about the recording, but also as much about the analysis that goes on after the recording. " "All the answers could be right there, in your life data. "
lifestream  datamining  privacy  self-surveillance  WashingtonPost  analysis  quantifiedself 
september 2008 by jschneider
this isnt a story i (30 June, 2007, Interconnected)
"Along with new visibilities comes social understanding of those new visibilities."
privacy  identity  presentation-of-self  datamining  hope  toreread 
july 2008 by jschneider
GPS gadgets can reveal more than your location - tech - 03 June 2008 - New Scientist Tech
"As it becomes easier to track and share our movements, the concept of "locational privacy" – controlling who can access our location records – becomes more important,"
GPS  privacy  datamining  traffic  surveillance 
june 2008 by jschneider
« earlier      

related tags

"1984"  "passive-observation"  "reality-mining"  "secret-lives-of-books"  ****  *****  ******  35mm  adsense  advertising  ai  algorithms  amazon  analysis  analytics  Anderson  anonymization  apache  APIs  archives  arrowsmith  art  astronomy  AT&T  attentin  attention  audio  bachelors  BBC  behavior  bigdata  bioinformatics  blogging  blogs  books  books-toread  business  business-intelligence  buzz  CENS  Chris  Chronicle  CIA  CIC  classification  climate  clustering  CMU  code4lib  coffee  collection  commerce  companies  competition  conferences  content-analytics  content-enrichment  copyright  crowd-science  crowdmining  crowdsourcing  cs  culture  customer-service  customization  DARPA  data  data-deluge  data-ownership  dataanalysis  databases  datacuration  datamining  datamodeling  datasets  Dave  decision-making  demographics  Dempsey  development  digital  digitalhumanities  digitization  discovery-happens-elsewere  distant-reading  distributed  DoD  economics  Economist  EFF  electronicmedicalrecords  embedded  emotion  empathy  energy  enterprise  enterprise2.0  entity-extraction  environment  etymology  evaluation  Excel  facebook  facets  fairuse  feelings  fireeagle  flickr  flow  full-text  gamification  geolocation  gmail  google  googlebooks  googlenews  googleplus  GPS  graph  graphics  graphs  hadoop  Harris  hathitrust  healthcare  history  homophily  hope  humanrights  hyperlocal  identity  images  In-Q-Tel  indexing  infographics  information  infrastructure  intelligent-agents  interesting  IP  ir  java  JCDL  Jonathan  journalism  kdd  keyphrases  keywords  last.fm  latency  LDA  learning  libraries  license  licensing  lifestream  linkeddata  linkedin  literature  Lorcan  machine-learning  machinelearning  marketing  mashups  massmailings  math  mathematical-models  mediawiki  medicine  medline  memory  metadata  metafilter  Microsoft  million-books  mobile  mobility  models  modules  money  MONK  mood  mp3  mythology  n-grams  Nature  navigation  netflix  networks  NewYorkTimes  NJ2  nlp  NSA  nytimes  nz  O'Reilly  Obama  odd  ontologies  openaccess  opensource  Oracle  OTMI  pagerank  paper  Pattern  people  personal-value-precedes-network-value  personality  Peter-Brantley  policies  politics  PR  prepservation  presentation-of-self  preservation  privacy  programming  proximity  publicity  pubmed  python  quantifiedself  R  RDF  recession  recommedned  recommendations  recommended  research  researchers  resource  retnetion  ruby  sales  salon  scalability  science  scientific-method  scraping  search  secrets  security  self-expression  self-monitoring  self-surveillance  semantic  semantic-content-enrichment  semanticweb  sensors  sentiment  services  siderean  Singapore  socialnetworking  software  sorting  stanford  statistics  status.net  storage  stories  strategic-reading  strategy  superbowl  surveillance  syllabi  taxonomy  technology  technology-and-spirit-project  ted  temporal-IR  text-analytics  text-mining  textmining  textual-analysis  tolook  topic-modelling  toread  toreread  ToS  tracking  traffic  trends  twitter  U.S.  U.S.-regional-differences  UCLA  ui  UK  unsupervised-learning  usage  VC  via:@hartzog  via:@timoreilly  videogames  videos  visualization  WashingtonPost  weather  web2.0  webmining  webscience  wikipedia  Wired  words  WSJ  XTech2008  Zigtag_Imported_Bookmarks 

Copy this bookmark:



description:


tags: