rybesh + data   61

EZID: EZID Home
EZID (easy-eye-dee) makes it easy to create & manage unique, long-term identifiers.
data  identifiers  citation  URI 
4 weeks ago by rybesh
Data, Journalism and the Problem of Narrativity « Data Miner UK
Information is costly to manipulate and retrieve. By finding the pattern, the logic of the series, you no longer need to memorize it all. You just store the pattern. And, as we can see here, a pattern is obviously more compact than raw information. We have a hunger for rules because we need to reduce the dimension of matters so they can get into our heads. A novel, a story, a myth, or a tale, all have the same function: they spare us from the complexity of the world. They help build in our mind an idea. And that’s what true narratives do. They don’t just paint pictures they build structures in our mind upon which logic is built.
data  journalism  information  modeling  narrative 
7 weeks ago by rybesh
Semantic Conceptions of Information (Stanford Encyclopedia of Philosophy)
Information is notoriously a polymorphic phenomenon and a polysemantic concept so, as an explicandum, it can be associated with several explanations, depending on the level of abstraction (Floridi [2008]) adopted and the cluster of requirements and desiderata orientating a theory. The reader may wish to keep this in mind while reading this entry, where some schematic simplifications and interpretative decisions will be inevitable.
philosophy  information  data  theory  semantics  inls520 
8 weeks ago by rybesh
Why Google, and Simple, love TxVia | Felix Salmon
the amount of data that TxVia collects from every single one of its prepaid debit cards simply dwarfs the amount of data that banks collect with normal debit cards linked directly to a bank account.
finance  data  inls520 
8 weeks ago by rybesh
An organization ontology
This document describes a core ontology for organizational structures, aimed at supporting linked-data publishing of organizational information across a number of domains. It is designed to allow domain-specific extensions to add classification of organzations and roles, as well as extensions to support neighbouring information such as organizational activities.
metadata  standard  data  description  inls520 
8 weeks ago by rybesh
The RDF Data Cube Vocabulary
There are many situations where it would be useful to be able to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts. The Data Cube vocabulary provides a means to do this using the W3C RDF (Resource Description Framework) standard. The model underpinning the Data Cube vocabulary is compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations. The Data Cube vocabulary is a core foundation which supports extension vocabularies to enable publication of other aspects of statistical data flows.
metadata  standard  data  description  inls520  webinfo  statistics  science 
8 weeks ago by rybesh
Data Catalog Vocabulary (DCAT)
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.

By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.
metadata  standard  data  description  inls520  webinfo 
8 weeks ago by rybesh
Research Data Toolkit
The Library's Data Management Committee has compiled this toolkit to help researchers understand the issues involved in data management and provide resources for formulating data management plans.
research  data  management 
9 weeks ago by rybesh
10 MILLION INTERNATIONAL DYADIC EVENTS
When the Palestinians launch a mortar attack into Israel, the Israeli army does not wait until the end of the calendar year to react. Yet, most modern data collections are aggregated to the month or year. The data available here include almost 10 million individual events, each coded to the exact day they occur or become known. Each event is summarized in the data as "Actor A does something to Actor B", with Actors A and B recording about 450 countries and other (within-country) actors and "does something to" coded in an ontology of about 200 types of actions. The data are coded by computer from millions of Reuters news reports. The software system (produced by VRA) that performs this task has been independently evaluated by King and Lowe (2003). This article found that for the numbers of events it was possible to convince humans (trained Harvard undergraduates) to code by hand, the machine did as well as the humans. For much larger numbers of events for which no expert coder could keep up, the machine dominates.
events  politicalscience  data  machinelearning  textanalysis 
10 weeks ago by rybesh
alimanfoo/petl
petl is a tentative Python module for extracting, transforming and loading tables of data.
python  data  tools 
12 weeks ago by rybesh
Stephen Ramsay - Found: Data, Textuality, and the Digital Humanities
Computational processes generate lists: lists of numbers, lists of words, lists of coordinates, lists of properties. We transform these lists into more exalted forms -- visualizations, maps, information systems, software tools -- but the list remains the fundamental data structure of computing, from which most other structures are derived. Whenever we treat the world as data, we are nearly always creating lists. But what sort of *texts* are these, and can we consider them the same way that we consider other texts within the humanities? In this paper, I offer some meditations on the nature of lists, and suggest that it is the paucity of information they provide -- and the ways in which that paucity licenses narrative and explanation -- that allows us to imagine computational representations as texts that can play a fruitful role in the wider context of humanistic inquiry.
digitalhumanities  data  organization  narrative 
february 2012 by rybesh
JSON, HTTP and data links « Web of Data
Take JSON and HTTP (some use REST for marketing purposes) and add the capability of following (typed) links that lead you to more data (context, definitions, related stuff, whatever).

And here are the three current contenders in this space (in the order of stage appearance) – Microsoft’s OData JSON Format, The Object Network: Linking up our APIs, and – as I learned from Charl van Niekerk on #whatwg IRC channel tonite – A Convention for HTTP Access to JSON Resources.
json  http  semweb  data 
february 2012 by rybesh
HTML Data Guide
Microformats, RDFa and microdata all enable consumers to extract data from HTML pages. This data may be embedded within enhanced search engine results, exposed to users through browser extensions, aggregated across websites or used by scripts running within those HTML pages.

This guide aims to help publishers and consumers of HTML data use it well. With several syntaxes and vocabularies to choose from, it provides guidance about how to decide which meets the publisher's or consumer's needs. It discusses when it is necessary to mix syntaxes and vocabularies and how to publish and consume data that uses multiple formats. It describes how to create vocabularies that can be used in multiple syntaxes and general best practices about the publication and consumption of HTML data.
html  linkeddata  web  data  standards  reference  webinfo 
january 2012 by rybesh
Mr. Data Converter
I will convert your Excel data into one of several web-friendly formats, including HTML, JSON and XML.
data  json  xml  tools 
january 2012 by rybesh
Sean Gillies Blog / 1087 / Pleiades "un-GIS" poster
This is the graphic from the imagemap version of the poster Tom Elliott and I put together for the Digital Humanities conference at Stanford University. It's a Frankenstein's monster of a diagram, showing relationships on different planes between Pleiades resources and code and other resources and communities.
gis  history  data  representation  geospatial 
november 2011 by rybesh
Sean Gillies Blog / 1055 / What's an Un-GIS?
I feel it's important for users and watchers in the humanities, which is going gang-busters for GIS technology, to understand the differences between Pleiades and a ESRI geodatabase, an OGC-style feature/map service, or a conventional digital gazetteer. I don't think it's useful to try to precisely define "Un-GIS", but here are a few qualities that I think distinguish Pleiades from a typical geographic information system or spatial data infrastructure.
gis  representation  data  geospatial  history  digitalhumanities 
november 2011 by rybesh
Writing Without Words: Visualizing A Book | Brain Pickings
London-based artist Stefanie Posavec has a gift for words. Or for the lack thereof, to be exact. Her latest project, Writing Without Words, explores the literary world when its most important building blocks are removed by visually representing text.
books  data  visualization  infoviz  narrative  language 
november 2011 by rybesh
INLS 465: Understanding Information Technology for Managing Digital Collections
The fundamental motivation for this course is that anyone responsible for digital collections will have to understand and be conversant in various aspects of the associated information technologies, in order to evaluate the work of developers, delegate tasks, write appropriate requests for proposals (RFPs), and establish reasonable management and preservation policies.
sils  syllabus  standards  IT  data  digitization  forensics 
november 2011 by rybesh
CommonCrawl | | CommonCrawl
Common Crawl Foundation is a California 501(c)3 non-profit founded by Gil Elbaz with the goal of democratizing access to web information by producing and maintaining an open repository of web crawl data that is universally accessible.
opensource  web  data 
november 2011 by rybesh
Tangle: a JavaScript library for reactive documents
Tangle is a JavaScript library for creating reactive documents. Your readers can interactively explore possibilities, play with parameters, and see the document update immediately. Tangle is super-simple and easy to learn.
interactive  data  infoviz  javascript 
november 2011 by rybesh
Henri Bergius: Weblog: Business analytics with CouchDB and NoFlo
Any business analytics system dealing with moderate amounts of data can be built following this approach.

Apache CouchDB is the central data store
All data is stored as JSON-LD entities
NoFlo handles all data imports
Analytics based on the data are done with CouchDB map/reduce
Visualization happens with a CouchApp using JavaScript InfoVis Toolkit
couchdb  nodejs  flowbased  programming  data  analysis  infoviz 
october 2011 by rybesh
What Goes Around Comes Around - Stonebraker & Hellerstein (35 years of data model proposals)
This paper provides a summary of 35 years of data model proposals, grouped into 9
different eras. We discuss the proposals of each era, and show that there are only a few
basic data modeling ideas, and most have been around a long time. Later proposals
inevitably bear a strong resemblance to certain earlier proposals. Hence, it is a
worthwhile exercise to study previous proposals.
data  modeling  models  history 
september 2011 by rybesh
Conducting a Data Interview
In this poster, we share a set of ten questions that a librarian can use as a starting point for such a “data interview”. It is not a comprehensive strategy but instead a practical tool to draw out information that needs to be considered in order to evaluate the suitability of a dataset for the collection and the requirements for the infrastructure and services that will be needed for data curation.
preservation  data  management  curation 
september 2011 by rybesh
Martha Palmer | Projects | Verb Net
VerbNet (VN) (Kipper-Schuler 2006) is the largest on-line verb lexicon currently available for English. It is a hierarchical domain-independent, broad-coverage verb lexicon with mappings to other lexical resources such as WordNet (Miller, 1990; Fellbaum, 1998), Xtag (XTAG Research Group, 2001), and FrameNet (Baker et al., 1998). VerbNet is organized into verb classes extending Levin (1993) classes through refinement and addition of subclasses to achieve syntactic and semantic coherence among members of a class. Each verb class in VN is completely described by thematic roles, selectional restrictions on the arguments, and frames consisting of a syntactic description and semantic predicates with a temporal function, in a manner similar to the event decomposition of Moens and Steedman (1988).
corpus  linguistics  nlp  language  data  frame  semantics 
august 2011 by rybesh
LDC Catalog
Proposition Bank I was produced by Linguistic Data Consortium (LDC) catalog number LDC2004T14 and ISBN 1-58563-304-6.

This is a semantic annotation of the Wall Street Journal section of Treebank-2. More specifically, each verb occurring in the Treebank has been treated as a semantic predicate and the surrounding text has been annotated for arguments and adjuncts of the predicate. The verbs have also been tagged with coarse grained senses and with inflectional information. This work was done in the Computer and Information Sciences Department at the University of Pennsylvania.
frame  semantics  nlp  language  data 
august 2011 by rybesh
Home page for the book, "Bayesian Data Analysis"
This book is intended to have three roles and to serve three associated audi- ences: an introductory text on Bayesian inference starting from first principles, a graduate text on effective current approaches to Bayesian modeling and com- putation in statistics and related fields, and a handbook of Bayesian methods in applied statistics for general users of and researchers in applied statistics.
bayes  statistics  data  analysis 
june 2011 by rybesh
Specification - linked-data-api - Linked Data API Specification - API and formats to simplify use of linked data by web-developers - Google Project Hosting
This document defines a vocabulary and processing model for a configurable API layer intended to support the creation of simple RESTful APIs over RDF triple stores.
linkeddata  api  rdf  data  webservices 
june 2011 by rybesh
Kasabi | Kasabi
Kasabi brings together data providers (organisations, businesses, individuals) with developers and domain experts. This community lets data providers explore business models and add value to their datasets, while allowing developers access to build their applications and services around them.
rdf  opendata  data  market  semweb  linkeddata 
june 2011 by rybesh
Google Books: American English (155 billion words)
This interface allows you to search the Google Books data in many ways that are much more advanced than what is possible with the simple Google Books interface. You can search by word, phrase, substring, lemma, part of speech, synonyms, and collocates (nearby words). You can copy the data to other applications for further analysis, which you can't do with the regular Google Books interface. And you can quickly and easily compare the data in two different sections of the corpus (for example, adjectives describing women or art or music in the 1960s-2000s vs the 1870s-1910s).
american  books  corpus  data  statistics  language 
may 2011 by rybesh
DataTables (table plug-in for jQuery)
Data Tables - sortable, filterable javascript tables: thanks for the tip , saved me like 10 hrs!
jquery  tables  data  plugin  from twitter_favs
april 2011 by rybesh
Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization
This listing is but an initial step in portraying the history of the visualization of data. We started with the developments listed by Beniger and Robyn (BenigerRobyn:1978) and incorporated additional listings from Hankins (Hankins:1999), Tufte (Tufte:1983, Tufte:1990, Tufte:1997), Heiser (Heiser:2000), and others (now too numerous to cite individually). In most cases, we cite original sources (where known) for the record; occasional secondary sources are included as well, where they appear to contribute to telling the story.

To convey a real sense of the accomplishments requires much more context- words, images, and, most usefully, interpretation. In this chronological listing, it has proved convenient to make divisions by epochs, and we provide some more detailed commentaries for each of these. The careful reader will be able to discern other themes, relations, and connections, not stated explicitly.
data  design  graphics  history  visualization  infoviz 
april 2011 by rybesh
datajs - JavaScript Library for data-centric web applications
datajs is a new cross-browser JavaScript library that enables data-centric web applications by leveraging modern protocols such as JSON and OData and HTML5-enabled browser features. It's designed to be small, fast and easy to use.
html5  javascript  json  storage  odata  api  data 
april 2011 by rybesh
18 ways to think about data quality
Data "beauty" might be subjective, and the same data may have different applicability to different tasks, but there are a lot of obvious and straightforward ways of thinking about the quality of a dataset independent of the particular preferences of individua beholders. Here are just some of them.
data  quality  linkeddata 
april 2011 by rybesh
Jonathan Stray » A computational journalism reading list
I’d like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. This includes the journalistic mainstay of “reporting” — because information not published is information not known — but my definition is intentionally much broader than that. To succeed, this young discipline will need to draw heavily from social science, computer science, public communications, cognitive psychology and other fields, as well as the traditional values and practices of the journalism profession.
data  journalism  visualization  digitalhumanities 
april 2011 by rybesh
Dense JSON
We can define a "dense JSON" encoding that uses structural and textual templates. The document is still a JSON document and can be parsed using a regular JSON parser in any environment. Fancy parsers would go directly from the dense format into their final output, while simpler parsers can apply a simple JSON -> JSON transform that would return the kind of JSON you would expect for a regular scenario, with plain objects with repeating property names and all that. This approach probably comes with less optimal results in size but great interoperability while having reasonable efficiency.
json  data  ideas 
march 2011 by rybesh
Data Science Toolkit
A collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON API with command line, Python and Javascript interfaces. Available as a self-contained VM or EC2 AMI that you can deploy yourself.
data  tools  nlp  ec2  webservices 
march 2011 by rybesh
edu.stanford.nlp.ling (Stanford JavaNLP API)
This package contains the different data structures used by JavaNLP throughout the years for dealing with linguistic objects in general, of which words are the most generally used.
nlp  data  structures  models 
february 2011 by rybesh
DSPL: Dataset Publishing Language - Google Code
DSPL is the Dataset Publishing Language, a representation language for the data and metadata of datasets. Datasets described in this format can be processed by Google and visualized in the Google Public Data Explorer.
data  metadata  google  standards 
february 2011 by rybesh
Beyond the PDF
The goal of the workshop was not to produce a white paper! Rather it was to identify a set of requirements, and a group of willing participants to develop a mandate, open source code and a set of deliverables to be used by scholars to accelerate data and knowledge sharing and discovery . Our starting point, and the only prerequisite to participating, was the belief that we need to move Beyond the PDF (meant to capture a common philosophy, not necessarily to be taken literally).

In a heady moment we might also describe our efforts as the desire to contribute to the development of a free and open digital printing press for the 21st century. A platform, when utilized, moves us beyond a static and disparate data and knowledge representation to a rich integrated content which grows and changes the more we learn. A system (content plus platform) from which a scholar can interact and once evaluated shows improved understanding and interest.
publishing  data  scholarship  tools  KR 
january 2011 by rybesh
DataWiki - Google Labs
A DataWiki extends the idea of a normal wiki to:

make it easy to create, edit, share and visualize structured data, and
interlink the data formats to enhance the understanding and usefulness of each.
data  collaboration  wiki 
december 2010 by rybesh
Chris Heathcote: anti-mega: griotism
Whilst we have the luxury of open APIs to services, it’s rarely rich enough data for interesting stories to be told. APIs tend to be locked in the present – as the present is what a lot of services are fixated on. Use, not stories. Some element of time is normally needed to pull out data that tells interesting stories, often long periods of time.
data  narrative  datamining  history  time 
july 2010 by rybesh
id.loc.gov load/update history
The idea with the Atom feed is to allow you to keep your own local version of the records synchronized with id.loc.gov. You can follow the “next” URLs in the atom:link elements to drill backwards until you’ve seen a change at a record update time you already knew about.
authority  data  feeds 
may 2010 by rybesh
Graph API - Facebook Developers
The new Graph API attempts to drastically simplify the way developers read and write data to Facebook. It presents a simple, consistent view of the Facebook social graph, uniformly representing objects in the graph (e.g., people, photos, events, and fan pages) and the connections between them (e.g., friend relationships, shared content, and photo tags).
social  data  api  rest  webservices  facebook  metadata 
april 2010 by rybesh
Open Data Protocol (OData)
The Open Data Protocol (OData) is a Web protocol for querying and updating data that provides a way to unlock your data and free it from silos that exist in applications today. OData does this by applying and building upon Web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores.
atom  data  webservices  standards  linkeddata 
april 2010 by rybesh
IBM Emerging Technologies - BigSheets
BigSheets is an extension of the mashup paradigm that:
1. Integrates gigabytes, terabytes, or petabytes of unstructured data from web-based repositories
2. Collects a wide range of unstructured web data stemming from user-defined seed URLs
3. Extracts and Enriches that data using the unstructured information management architecture you choose (LanguageWare,OpenCalais, etc.)
4. Lets you Explore and Visualize this data in specific, user defined contexts. (such as ManyEyes)
data  analytics  hadoop  spreadsheet  archives  nlp  infoviz 
march 2010 by rybesh
GeoCommons Finder!
Upload, organize and share your Geographic Data.
locative  data  maps 
february 2010 by rybesh
[Dbpedia-discussion] Inconsistency Feedback from DBpedia to Wikipedia
"As Bruno Bachimont uses to say, an ontology is mainly a tool to explicit inconsistencies of our knowledge, pointing to new questions for research. After that, you can throw it away."
data  ontology  logic  semantics  semweb  quote  modeling  research  philosophy 
august 2009 by rybesh
The GeoJSON Format Specification
GeoJSON is a geospatial data interchange format based on JavaScript Object Notation (JSON).
json  javascript  gis  metadata  locative  data 
july 2009 by rybesh
Amazon Web Services Developer Community : Wikipedia Page Traffic Statistics
This dataset contains a 320 GB sample of the data used to power trendingtopics.org. It includes 7 months of hourly page traffic statistics for over 2.5 Million wikipedia articles (~ 1 TB uncompressed) along with the associated wikipedia content, linkgraph, & metadata.
wikipedia  data 
june 2009 by rybesh
The Problem with Answers
The great disadvantage of testing and data is that you get precise, decisive answers you can and will act on, but you almost never know what question you really asked.
data  research  methods  interpretation  science  engineering  design  testing 
march 2009 by rybesh
Art Against Information: Case Studies in Data Practice
Data art involves a creative grappling with the nature of our now ubiquitous data systems. It draws data out, makes it explicit, literally provides it with an image. It also probes data's constitution, potential, and significance. In the process of working pragmatically with data — using it as a generative resource, a way of making — data art is involved in the culturally crucial figuration of data and its contemporary domain. This practice is a concrete exploration of what data is, does, and can do, but it also involves a set of assumptions, narratives and ontologies that construct data as an entity in the cultural imagination. That construction is at the core of this analysis.
data  art  criticism  infoviz 
november 2008 by rybesh
Visualizing Corporate America
Using the SEC Data that was mentioned previously, I created a graphical network of which companies share board members and CEOs, a sort of social network of companies.

Click on the image to see the full-sized version. There’s a lot of other data represented there including genders, market caps, revenue and locations.
Applications  Data  corporations  freebase  Metaweb  sec  toby_segaran  from google
april 2008 by rybesh
RubyForge: Conveyor: Project Info
Conveyor can be used like an application-agnostic version of MySQL binlogs, which can be replayed to write data into multiple, diverse data stores.
web  data  architecture  code  ruby  database  search 
february 2008 by rybesh
OpenTextMining
Open Text Mining Interface (OTMI) is an initiative from Nature Publishing Group (NPG). It aims to enable scholarly publishers, among others, to disclose their full text for indexing and text-mining purposes but without giving it away in a form that is rea
academia  publishing  copyright  data  nlp  standards  datamining 
november 2007 by rybesh
Freebase Exhibit Example
Example of using Exhibit to dynamically display Freebase data.
data  infoviz  javascript  code  howto  semweb 
november 2007 by rybesh
rison - json for uris
Rison is a slight variation of JSON that looks vastly superior after URI encoding.
data  javascript  web  architecture  design 
november 2007 by rybesh
ELOKA
ELOKA will provide a data management and networking service for community-based research that keeps control of data in the hands of community data providers, while still allowing for broad searches and sharing of information.
arctic  community  research  data  management  delivery  collaboration  tools 
august 2007 by rybesh
AON-CADIS: Arctic Observing Network Cooperative Arctic Data and Information Service
CADIS supports the Arctic Observing Network (AON). It will be a portal for data discovery, and provide near-real-time data delivery, a repository for data storage, and tools to manipulate data.
arctic  research  data  storage  management  delivery  tools 
august 2007 by rybesh

related tags

academia  american  analysis  analytics  api  apis  Applications  architecture  archives  arctic  art  atom  authority  bayes  books  citation  code  collaboration  community  copyright  corporations  corpus  couchdb  criticism  curation  data  database  datamining  dbpedia  delivery  description  design  digitalhumanities  digitization  ec2  engineering  events  facebook  feeds  finance  flowbased  forensics  frame  freebase  geo  geospatial  gis  google  graphics  hadoop  history  howto  html  html5  http  ideas  identifiers  information  infoviz  inls520  interactive  interpretation  IT  javascript  journalism  jquery  json  KR  language  linguistics  linkeddata  locative  logic  machinelearning  management  maps  market  metadata  Metaweb  methods  modeling  models  narrative  nlp  nodejs  odata  ontology  opendata  opensource  organization  philosophy  plugin  politicalscience  preservation  programming  publishing  python  quality  quote  rdf  reference  representation  research  rest  ruby  scholarship  science  search  sec  semantics  semweb  sils  social  spreadsheet  standard  standards  statistics  storage  structures  syllabus  tables  testing  textanalysis  theory  time  toby_segaran  tools  URI  visualization  web  webinf  webinfo  webservices  wiki  wikipedia  xml 

Copy this bookmark:



description:


tags: