rahuldave + data_services   3

Hypothes.is: A Peer-Review Layer for the Whole Internet
A team of long-time leaders of the Internet community have come together behind Dan Whaley, one of the forefathers of contemporary search engines, to build a system called Hypothes.is: an "open-source Internet platform to crowdsource peer-review on information everywhere."

It's a peer review system to check, verify and critique content all over the Web - and beyond. "Improving the credibility of the information we consume is humanity's grandest challenge," Whaley says. Topic experts will be enlisted in addition to crowdsourcing, a reputation system, browser plug-ins and APIs are on the roadmap and all the data will be stored at the Internet Archive. It sounds incredible, and it's raising money on Kickstarter right now. The goal is for a prototype to be released in the first half of next year.

Sponsor

Hypothes.is is an incredibly ambitious non-profit project. It has the backing of some of the leading minds on the Web, too. From John Perry Barlow of the EFF to Garret Camp of StumbleUpon to Kaliya Hamlin of the Internet Identity Workshop to Nate Oostendorp of Slashdot, and many more. It's a really impressive team of advisors. The project is lead by Dan Whaley, himself a very interesting entrepreneur.

Outside observers are immediately enthusiastic as well. Chris Saad of real-time database Echo articulates the project's potential well, I think. "Trying to achieve thoughtful, Web-wide, open standards-based fact checking across the Web is one of the great challenges and opportunities of the Web, and humanity. I'm super excited about the possibilities."

I am too.

Hypothesis quick overview 2011-07-10 View more presentations from dwhly
Discuss
Data_Services  from google
october 2011 by rahuldave
Your Neighborhood Data Visualized: Startup Builds Census Map Block by Block
The 2010 US Census has begun publishing its detailed demographic data state by state and the race now begins to see which data geeks can do the coolest things with the information. Remember when large-scale social data was only collected once a decade? When terms like "social graph" and "interest graph" didn't even exist?

It turns out that old fashioned data still has a lot to teach us. One company has already launched its first block-by-block, state-wide data visualization site for New Jersey census data. Want to see hyper-localization and personalization in action? Check out this map from MoonShadow Mobile.

Sponsor

MoonShadow Mobile specializes in working with large datasets and making them easy to navigate around in quickly. The company's service allows anyone to check out race, age, gender and other information from eight different data sets about any particular block in New Jersey. New states will be added as Census Data is released to the public. The interface is a little challenging to learn, but it may make up for that with its feature set and customizability.

MoonShadow makes its living selling similar software to public agencies and others interested in drawing redistricting lines or in municipal planning. Want to know if a location might be a good place to put a new park? A service like this can help you determine how many children live in the area.

Other data visualization projects built on the US Census include a less granular but more visually dazzling effort from well-known data desingers Stamen Design. The New York TImes has made a go at it as well; the Times does every block nation-wide, using estimated numbers.

Thanks to last week's release of the Google Public Data Explorer, anyone can now do many things like this with big sets of public data.

Bring on the census visualizations, folks. The visualization, cross-referencing and subsequent insights enabled by this latest census are likely to far surpass what the tech world was able to do with the 2000 US Census. Just imagine what magic will be performed with the census data of forthcoming decades.

If, that is, Facebook and Twitter data don't outshine anything the governement has been able to collect before. Data collection every ten years does seem a little old-fashioned, doesn't it? None the less, let's see who can top the quality of Census data. This data clearly remains very valuable.

Discuss
Data_Services  from google
february 2011 by rahuldave
On Facebook, Angry People Are More Popular (Plus Other Fascinating Statistical Correlations)
Facebook's data team proved once again today that when you analyze a large set of anonymous user data from the world's biggest social network, you can learn some very interesting things about the state of humanity.

In a blog post titled What's on your mind?, the company disclosed the results of its text analysis of 1 million anonymized messages. Among the findings: Young people swear more than older people and older people talk about other people more than just themselves. Popular people are more likely to talk about other people, TV and movies, to swear and use religious words. Less popular people are more likely to talk about work, sleeping, eating and thinking. These are but a few of the many observations made by the in-house data team. The biggest question about the data remains unanswered, though: what could a world of independent researchers discover in this data?

Sponsor

Above: Facebook found that the words on top of the left chart appeared more in profiles from older people, on the right, from more popular people. The company's blog post contains 5 more graphs concerning other word correlations.

For Facebook to make bulk, anonymized data available to independent researchers has long been a hope of mine and I've argued about how important an opportunity this is all the way up to Mark Zuckerberg himself.

My favorite example of how data like this can be important is from history. When U.S. census data and bank home loan data were both made available for computer analysis and cross referencing for the first time, independent researchers unearthed a pattern of discrimination against African American families seeking to buy homes in big sections of major U.S. cities. This practice was called Real Estate Redlining and it was exposed thanks to aggregate data analysis. I am of the belief that social injustices of comparable significance, as well as opportunities for significant economic development, could be discovered in the patterns hidden across millions of Facebook status updates, friend connections, Likes and more.

It's great that Facebook is investing some of its resources into analyzing this data itself, but great opportunity is lost if the company fails to allow outside researchers to analyze this data as well.It's great that Facebook is investing some of its resources into analyzing this data itself, but great opportunity is lost if the company fails to allow outside researchers to analyze this data as well.

Oliver Chiang, at Forbes, agreed with my argument in an article this month: "But really, what Facebook should do... is open up its data for research. Because they don't, we get highly sanitized findings (like these top trends, or the finding that being active on Facebook leads to increased happiness), and even, reportedly, a black market for Facebook data. The company collects the thoughts, images and content of more than half a billion users - that data could be used for good."

Slate.com's Michael Agger wrote last month in an article discussing the opportunities latent in Facebook's data, "It would be helpful for transportation planners to know the places where people complain the most about traffic. Educators could see the data and sentiment analysis around how a community feels about its local schools."

Bernardo Huberman, a social technology researcher at HP Labs who was able to gain access to bulk Facebook data years ago, before the site was as large, controversial and armed with lawyers as it is today, is both understanding and hopeful.

"This data is amazingly important from a commercial point of view," Huberman told me in a telephone interview last week.

"But [Zuckerberg], he's not a researcher, he's just a businessman. I have a feeling that Twitter's situation is roughly the same; all this research stuff and so on is gravy. [In recent years] I've had very little traction in terms of getting access to their data. They are busy with other things, with keeping their business viable.

"They have a different view of it. Perhaps in a few years, Zuckerburg will relax and say 'I want to be the kind of public figure that wants to release data'....but right now I don't think that will motivate these people."

I hope that's not correct. I hope that every time the Facebook Data Team performs another batch of analysis on anonymized, bulk Facebook data and gives us an opportunity to look into our own souls - the potential that lies untapped in that data will be taken all the more seriously. That potential will never be realized if analysis of it is limited to the eyes, minds, interests, skills and perspectives of the company's own researchers.

Discuss
Data_Services  from google
december 2010 by rahuldave

related tags

Data_Services 

Copy this bookmark:



description:


tags: