Data Feed Query Explorer - Google Analytics - Google Code
february 2012 by soypunk
"With this tool you can play with the Data Export API by building queries to get data from your Google Analytics profiles. You can use these queries in any of the client libraries to build your own tools."
google
analytics
data
api
for-oris
february 2012 by soypunk
datascience/google-analytics-export-to-csv - GitHub
february 2012 by soypunk
"Google-Analytics-export-to-CSV is a simple, command-line tool for exporting data out of Google Analytics and writing it to a CSV file." It uses the Google Data API, so you can actually extract more columns and rows using this than GA's built-in web export tools.
google
analytics
data
java
software
csv
for-oris
february 2012 by soypunk
URLTE.AM
june 2011 by soypunk
"Welcome to the URLTeam website. The URLTeam is the ArchiveTeam subcommittee on URL shorteners. We believe that they pose a serious threat to the internet's integrity. If one of them dies, gets hacked or sells out, millions of links will stop working. Thus we preemptively release backups, because URL shorteners are too busy to make backups themselves."
url
data
web
architecture
june 2011 by soypunk
d3.js
march 2011 by soypunk
"D3 allows you to bind arbitrary data to a Document Object Model (DOM), and then apply data-driven transformations to the document. As a trivial example, you can use D3 to generate a basic HTML table from an array of numbers. Or, use the same data to create an interactive SVG bar chart with smooth transitions and interaction."
javascript
data
visualization
canvas
html
html5
css
svg
for-oris
march 2011 by soypunk
google-refine - Project Hosting on Google Code
november 2010 by soypunk
"Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase."
google
data
software
november 2010 by soypunk
DICT Development Group | Download DICT Development Group software for free at SourceForge.net
october 2010 by soypunk
"Client/server software, human language dictionary databases, and tools supporting the DICT protocol (RFC 2229)."
dictionary
data
service
cli
october 2010 by soypunk
Dan Suciu's Project MystiQ
august 2010 by soypunk
"MystiQ is a system that uses a probabilistic data model to find answers in large numbers of data sources exhibiting various kinds of imprecisions."
data
august 2010 by soypunk
Analyzing World Cup Data with YQL (Yahoo! Developer Network Blog)
july 2010 by soypunk
"I have a confession to make - I am an open data junkie. When the Guardian newspaper in the UK released a comprehensive set of FIFA Worldcup 2010 statistics, I got itchy fingers and wanted to find the story in that set of data. What better to use for this than YQL (Yahoo! Query Language)?"
yql
yahoo
soccer
workshop
guardian
web
data
july 2010 by soypunk
Munmun's Homepage
may 2010 by soypunk
"I have released the following datasets for non-commerical research purposes, under a Creative Commons license (refer below). The data for most of the social sites were collected through crawlers that were based on the respective APIs, or via HTML parsers (written by me). "
data
from delicious
may 2010 by soypunk
The MessagePack Project
april 2010 by soypunk
"MessagePack is a binary-based efficient object serialization library. It enables to exchange structured objects between many languages like JSON. But unlike JSON, it is very fast and small."
data
services
from delicious
april 2010 by soypunk
CiteULike: Everyone's library
april 2010 by soypunk
"citeulike is a free service for managing and discovering scholarly references"
library
research
web
del.icio.us
social
data
collection
from delicious
april 2010 by soypunk
Superfeedr : Real-time feed parsing in the cloud for web-developers
march 2010 by soypunk
"Real-time feed parsing in the cloud. Why build a complex feed fetching and parsing infrastructure when you could just use Superfeedr?"
feeds
atom
web
data
collection
from delicious
march 2010 by soypunk
Is Facebook Markup Language (FBML) HTML, XML or some homemade demon spawn of the two? - O'Reilly Broadcast
march 2009 by soypunk
I agree that, as of right now, FBML is a scary, under-specified, bag of hurt. I wish I could find myself surprised that the Facebook developer base is happily using it, but... I can't.
facebook
html
xml
data
formats
markup
standards
march 2009 by soypunk
MindNode
february 2009 by soypunk
"MindNode Pro and MindNode are elegant and simple-to-use mindmapping applications for the Macintosh that help to visually collect, classify and structure ideas & organize, study and solve problems."
mindmapping
data
visualization
software
osx
february 2009 by soypunk
jacobian's jellyroll at master - GitHub
february 2009 by soypunk
"You keep personal data in all sorts of places on the internets. Jellyroll brings them together onto your own site"
django
python
web
data
aggregator
february 2009 by soypunk
Crowbar - SIMILE
january 2009 by soypunk
"Crowbar is a web scraping environment based on the use of a server-side headless mozilla-based browser. Its purpose is to allow running javascript scrapers against a DOM to automate web sites scraping but avoiding all the syntax normalization issues."
mozilla
gecko
html
dom
parsing
screenscraping
data
january 2009 by soypunk
Google Open Source Blog: Google Blog Converters 1.0 Released
january 2009 by soypunk
That this project exists demonstrates just how bad the nature of data exports are in the blog software world.
google
blog
data
january 2009 by soypunk
Scrapy – Trac
december 2008 by soypunk
"An open source web crawling and screen scraping framework written in Python." Pure gold.
python
web
data
crawler
december 2008 by soypunk
Opera: State of the Mobile Web report
december 2008 by soypunk
"Every month Opera conducts a definitive analysis of the key trends affecting the mobile Web worldwide. We publish these findings as the State of the Mobile Web. Each report provides the most frequently visited sites, key data metrics from Opera Mini and a snapshot of a specific trend chosen by the analysis team that month."
mobile
web
research
data
december 2008 by soypunk
Announcing the TimesTags API - Open - Code - New York Times Blog
october 2008 by soypunk
"The API returns official NYTimes.com tags that match your search string. Even better, it ranks the results from most commonly used tags to least."
nytimes
folksonomy
data
api
web
service
october 2008 by soypunk
Page Inlink Analyzer — Powered by Yahoo! Search, Delicious and YUI
october 2008 by soypunk
Effective use of YUI, JSON, XHR, etc.
yui
json
yahoo
delicious
web
data
october 2008 by soypunk
MAMA: Key findings - Opera Developer Community
october 2008 by soypunk
A study of HTML syntax usage via sampling the web. Think of it as an update to the Google web authoring report from a couple of years ago.
web
html
data
research
opera
october 2008 by soypunk
Experience versus talent shapes the structure of the Web — PNAS
september 2008 by soypunk
"We use sequential large-scale crawl data to empirically investigate and validate the dynamics that underlie the evolution of the structure of the web. We find that the overall structure of the web is defined by an intricate interplay between experience or entitlement of the pages (as measured by the number of inbound hyperlinks a page already has), inherent talent or fitness of the pages (as measured by the likelihood that someone visiting the page would give a hyperlink to it), and the continual high rates of birth and death of pages on the web. We find that the web is conservative in judging talent and the overall fitness distribution is exponential, showing low variability."
research
web
data
september 2008 by soypunk
Random Sampling from a Search Engine's Index (application/pdf Object)
september 2008 by soypunk
"We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from a search engine’s index using only the search engine’s public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines."
web
search
data
research
paper
september 2008 by soypunk
FriendFeed Blog: Simple Update Protocol: Fetch updates from feeds faster
august 2008 by soypunk
On the fence about this right now... but feed consumers have to do something. I was wrestling with similar issues when spec'ing out a system to collect feeds (yes, millions per hour) for an internal linguistics project... in the current environment HTTP If-Modified-Since just isn't enough.
syndication
atom
rss
web
data
august 2008 by soypunk
ELIE - An Adaptive Information Extraction System | aidanf.net
july 2008 by soypunk
"ELIE is a tool for adaptive information extraction from text. It also provides a number of other text processing tools e.g. POS tagging, chunking, gazetteer, stemming."
ir
data
july 2008 by soypunk
Open Archives Initiative Protocol - Object Exchange and Reuse
july 2008 by soypunk
Standards for the description and exchange of aggregations of Web resources. These aggregations, sometimes called compound digital objects, may combine distributed resources with multiple media types including text, images, data, and video.
web
data
standard
nsf
research
july 2008 by soypunk
related tags
aggregator ⊕ analytics ⊕ apache ⊕ api ⊕ architecture ⊕ atom ⊕ audio ⊕ blog ⊕ book ⊕ canvas ⊕ cassandra ⊕ classification ⊕ cleanup ⊕ cli ⊕ collection ⊕ community ⊕ copyright ⊕ crawler ⊕ css ⊕ csv ⊕ data ⊖ database ⊕ del.icio.us ⊕ delicious ⊕ dictionary ⊕ django ⊕ dom ⊕ drug ⊕ facebook ⊕ feeds ⊕ file ⊕ filesystem ⊕ folksonomy ⊕ for-oris ⊕ formats ⊕ games ⊕ gecko ⊕ geo ⊕ geocoding ⊕ geography ⊕ gis ⊕ google ⊕ government ⊕ guardian ⊕ hadoop ⊕ health ⊕ history ⊕ howto ⊕ html ⊕ html5 ⊕ http ⊕ ir ⊕ java ⊕ javascript ⊕ journalism ⊕ json ⊕ kml ⊕ libraries ⊕ library ⊕ linguistics ⊕ lucene ⊕ mapping ⊕ mapreduce ⊕ marklogic ⊕ markup ⊕ medical ⊕ memcache ⊕ metadata ⊕ microsoft ⊕ mindmapping ⊕ mining ⊕ mobile ⊕ model ⊕ mongodb ⊕ mozilla ⊕ music ⊕ mysql ⊕ news ⊕ nosql ⊕ nsf ⊕ nytimes ⊕ ofac ⊕ opensource ⊕ opera ⊕ osx ⊕ paper ⊕ parallel ⊕ parsing ⊕ pdf ⊕ php ⊕ platform ⊕ privacy ⊕ pubmed ⊕ python ⊕ rdbms ⊕ rdf ⊕ reference ⊕ research ⊕ rest ⊕ rss ⊕ scaling ⊕ scival ⊕ scopus ⊕ screenscrape ⊕ screenscraping ⊕ search ⊕ service ⊕ services ⊕ soccer ⊕ social ⊕ software ⊕ solr ⊕ sql ⊕ standard ⊕ standards ⊕ storage ⊕ svg ⊕ syndication ⊕ system ⊕ tagging ⊕ testing ⊕ text ⊕ to:buy ⊕ to:read ⊕ twitter ⊕ unix ⊕ url ⊕ us ⊕ validation ⊕ video ⊕ visualization ⊕ web ⊕ wikipedia ⊕ workshop ⊕ xml ⊕ yahoo ⊕ yql ⊕ yui ⊕Copy this bookmark: