soypunk + data   95

Data Feed Query Explorer - Google Analytics - Google Code
"With this tool you can play with the Data Export API by building queries to get data from your Google Analytics profiles. You can use these queries in any of the client libraries to build your own tools."
google  analytics  data  api  for-oris 
february 2012 by soypunk
datascience/google-analytics-export-to-csv - GitHub
"Google-Analytics-export-to-CSV is a simple, command-line tool for exporting data out of Google Analytics and writing it to a CSV file." It uses the Google Data API, so you can actually extract more columns and rows using this than GA's built-in web export tools.
google  analytics  data  java  software  csv  for-oris 
february 2012 by soypunk
URLTE.AM
"Welcome to the URLTeam website. The URLTeam is the ArchiveTeam subcommittee on URL shorteners. We believe that they pose a serious threat to the internet's integrity. If one of them dies, gets hacked or sells out, millions of links will stop working. Thus we preemptively release backups, because URL shorteners are too busy to make backups themselves."
url  data  web  architecture 
june 2011 by soypunk
Data Wrangler
yet another "real time" data clean-up tool.
data  cleanup  web  software 
may 2011 by soypunk
d3.js
"D3 allows you to bind arbitrary data to a Document Object Model (DOM), and then apply data-driven transformations to the document. As a trivial example, you can use D3 to generate a basic HTML table from an array of numbers. Or, use the same data to create an interactive SVG bar chart with smooth transitions and interaction."
javascript  data  visualization  canvas  html  html5  css  svg  for-oris 
march 2011 by soypunk
google-refine - Project Hosting on Google Code
"Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase."
google  data  software 
november 2010 by soypunk
DICT Development Group | Download DICT Development Group software for free at SourceForge.net
"Client/server software, human language dictionary databases, and tools supporting the DICT protocol (RFC 2229)."
dictionary  data  service  cli 
october 2010 by soypunk
Dan Suciu's Project MystiQ
"MystiQ is a system that uses a probabilistic data model to find answers in large numbers of data sources exhibiting various kinds of imprecisions."
data 
august 2010 by soypunk
dase - Project Hosting on Google Code
"DASe (Digital Archive Services) is a lightweight digital asset repository."
php  data  storage  atom 
july 2010 by soypunk
Analyzing World Cup Data with YQL (Yahoo! Developer Network Blog)
"I have a confession to make - I am an open data junkie. When the Guardian newspaper in the UK released a comprehensive set of FIFA Worldcup 2010 statistics, I got itchy fingers and wanted to find the story in that set of data. What better to use for this than YQL (Yahoo! Query Language)?"
yql  yahoo  soccer  workshop  guardian  web  data 
july 2010 by soypunk
Munmun's Homepage
"I have released the following datasets for non-commerical research purposes, under a Creative Commons license (refer below). The data for most of the social sites were collected through crawlers that were based on the respective APIs, or via HTML parsers (written by me). "
data  from delicious
may 2010 by soypunk
The MessagePack Project
"MessagePack is a binary-based efficient object serialization library. It enables to exchange structured objects between many languages like JSON. But unlike JSON, it is very fast and small."
data  services  from delicious
april 2010 by soypunk
CiteULike: Everyone's library
"citeulike is a free service for managing and discovering scholarly references"
library  research  web  del.icio.us  social  data  collection  from delicious
april 2010 by soypunk
Superfeedr : Real-time feed parsing in the cloud for web-developers
"Real-time feed parsing in the cloud. Why build a complex feed fetching and parsing infrastructure when you could just use Superfeedr?"
feeds  atom  web  data  collection  from delicious
march 2010 by soypunk
ScenicOrNot
Clever way to get a data about a location via a photo
social  data  games 
may 2009 by soypunk
Is Facebook Markup Language (FBML) HTML, XML or some homemade demon spawn of the two? - O'Reilly Broadcast
I agree that, as of right now, FBML is a scary, under-specified, bag of hurt. I wish I could find myself surprised that the Facebook developer base is happily using it, but... I can't.
facebook  html  xml  data  formats  markup  standards 
march 2009 by soypunk
MindNode
"MindNode Pro and MindNode are elegant and simple-to-use mindmapping applications for the Macintosh that help to visually collect, classify and structure ideas & organize, study and solve problems."
mindmapping  data  visualization  software  osx 
february 2009 by soypunk
jacobian's jellyroll at master - GitHub
"You keep personal data in all sorts of places on the internets. Jellyroll brings them together onto your own site"
django  python  web  data  aggregator 
february 2009 by soypunk
Crowbar - SIMILE
"Crowbar is a web scraping environment based on the use of a server-side headless mozilla-based browser. Its purpose is to allow running javascript scrapers against a DOM to automate web sites scraping but avoiding all the syntax normalization issues."
mozilla  gecko  html  dom  parsing  screenscraping  data 
january 2009 by soypunk
Google Open Source Blog: Google Blog Converters 1.0 Released
That this project exists demonstrates just how bad the nature of data exports are in the blog software world.
google  blog  data 
january 2009 by soypunk
Scrapy – Trac
"An open source web crawling and screen scraping framework written in Python." Pure gold.
python  web  data  crawler 
december 2008 by soypunk
Opera: State of the Mobile Web report
"Every month Opera conducts a definitive analysis of the key trends affecting the mobile Web worldwide. We publish these findings as the State of the Mobile Web. Each report provides the most frequently visited sites, key data metrics from Opera Mini and a snapshot of a specific trend chosen by the analysis team that month."
mobile  web  research  data 
december 2008 by soypunk
Announcing the TimesTags API - Open - Code - New York Times Blog
"The API returns official NYTimes.com tags that match your search string. Even better, it ranks the results from most commonly used tags to least."
nytimes  folksonomy  data  api  web  service 
october 2008 by soypunk
MAMA: Key findings - Opera Developer Community
A study of HTML syntax usage via sampling the web. Think of it as an update to the Google web authoring report from a couple of years ago.
web  html  data  research  opera 
october 2008 by soypunk
Experience versus talent shapes the structure of the Web — PNAS
"We use sequential large-scale crawl data to empirically investigate and validate the dynamics that underlie the evolution of the structure of the web. We find that the overall structure of the web is defined by an intricate interplay between experience or entitlement of the pages (as measured by the number of inbound hyperlinks a page already has), inherent talent or fitness of the pages (as measured by the likelihood that someone visiting the page would give a hyperlink to it), and the continual high rates of birth and death of pages on the web. We find that the web is conservative in judging talent and the overall fitness distribution is exponential, showing low variability."
research  web  data 
september 2008 by soypunk
Random Sampling from a Search Engine's Index (application/pdf Object)
"We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from a search engine’s index using only the search engine’s public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines."
web  search  data  research  paper 
september 2008 by soypunk
FriendFeed Blog: Simple Update Protocol: Fetch updates from feeds faster
On the fence about this right now... but feed consumers have to do something. I was wrestling with similar issues when spec'ing out a system to collect feeds (yes, millions per hour) for an internal linguistics project... in the current environment HTTP If-Modified-Since just isn't enough.
syndication  atom  rss  web  data 
august 2008 by soypunk
ELIE - An Adaptive Information Extraction System | aidanf.net
"ELIE is a tool for adaptive information extraction from text. It also provides a number of other text processing tools e.g. POS tagging, chunking, gazetteer, stemming."
ir  data 
july 2008 by soypunk
Open Archives Initiative Protocol - Object Exchange and Reuse
Standards for the description and exchange of aggregations of Web resources. These aggregations, sometimes called compound digital objects, may combine distributed resources with multiple media types including text, images, data, and video.
web  data  standard  nsf  research 
july 2008 by soypunk
AtomServer
"AtomServer is a generic data store implemented as a REST-ful web service." Hooray!
atom  rest  web  service  data 
july 2008 by soypunk
« earlier      

related tags

aggregator  analytics  apache  api  architecture  atom  audio  blog  book  canvas  cassandra  classification  cleanup  cli  collection  community  copyright  crawler  css  csv  data  database  del.icio.us  delicious  dictionary  django  dom  drug  facebook  feeds  file  filesystem  folksonomy  for-oris  formats  games  gecko  geo  geocoding  geography  gis  google  government  guardian  hadoop  health  history  howto  html  html5  http  ir  java  javascript  journalism  json  kml  libraries  library  linguistics  lucene  mapping  mapreduce  marklogic  markup  medical  memcache  metadata  microsoft  mindmapping  mining  mobile  model  mongodb  mozilla  music  mysql  news  nosql  nsf  nytimes  ofac  opensource  opera  osx  paper  parallel  parsing  pdf  php  platform  privacy  pubmed  python  rdbms  rdf  reference  research  rest  rss  scaling  scival  scopus  screenscrape  screenscraping  search  service  services  soccer  social  software  solr  sql  standard  standards  storage  svg  syndication  system  tagging  testing  text  to:buy  to:read  twitter  unix  url  us  validation  video  visualization  web  wikipedia  workshop  xml  yahoo  yql  yui 

Copy this bookmark:



description:


tags: