Automatic text analytics using DBpedia and PoolParty – A Live Demo |The Semantic Puzzle
Let me show you which steps have to be taken to generate a high-quality text mining application, ready to be used to annotate and to categorize any kind of text or documents covering nearly any domain. With our approach of thesaurus based text mining your documents can also be linked to the world of linked (open) data; enrich your documents with data from the LOD cloud!
webinfo  inls520  semweb  textanalysis  classification  skos  tools 
8 hours ago
N-grams: corpus based (COCA, COHA, Spanish, Portuguese)
These n-grams are based on the largest publicly-available, genre-balanced corpus of English -- the 425 million word Corpus of Contemporary American English (COCA). With this n-grams data (2, 3, 4, 5-word sequences, with their frequency), you can carry out powerful queries offline -- without needing to access the corpus via the web interface.
english  corpus  linguistics  nlp  ngrams 
9 hours ago
cheese it, the cops!: absurd
"the last thing you want to do is prove professors of communication right."
from twitter
10 hours ago
The Code4Lib Journal – HTML5 Microdata and Schema.org
This article is an introduction to Microdata and Schema.org. The first section describes what HTML5, Microdata and Schema.org are, and the problems they have been designed to solve. With this foundation in place section 2 provides a practical tutorial of how to use Microdata and Schema.org using a real life example from the cultural heritage sector. Along the way some tools for implementers will also be introduced. Issues with applying these technologies to cultural heritage materials will crop up along with opportunities to improve the situation.
webinfo  microdata 
11 hours ago
Conditional Random Fields
Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees and lattices. The underlying idea is that of defining a conditional probability distribution over label sequences given a particular observation sequence, rather than a joint distribution over both label and observation sequences. The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs in order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional Markov models based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-world tasks in many fields, including bioinformatics, computational linguistics and speech recognition.
machinelearning  nlp  crf  textmining  metadata 
13 hours ago
Olivier Labs | Jason
Jason is a JSON viewer & editor for Mac OS X. It can open local documents as well as download JSON data via HTTP and, in case of invalid data, an error message is presented and the line containing the error is highlighted.
json  tools 
yesterday
PhantomJS: Headless WebKit with JavaScript API
PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

PhantomJS is an optimal solution for fast headless testing, site scraping, pages capture, SVG renderer, network monitoring and many other use cases.
javascript  scraping  testing 
yesterday
Graphs Beyond the Hairball | eagereyes
Networks are usually drawn using a technique called node-link diagrams. While that works well for small graphs (the technical name for networks), it breaks down beyond a few dozen nodes. Better techniques exist, though these are currently focused on specific types of graphs or answer particular questions.
infoviz  networks 
yesterday
N-Quads: Extending N-Triples with Context
This document describes N-Quads, a format that extends N-Triples with context. Each triple in an N-Quads document can have an optional context value.
semweb  rdf  standards 
2 days ago
Web Data Commons
Web Data Commons will extract all Microformat, Microdata and RDFa data that is contained in the Common Crawl corpus and will provide the extracted data for free download in the form of RDF-quads as well as CSV-tables for common entity types (e.g. product, organization, location, ...).
semweb  rdfa  web  metadata  webinfo  microdata  microformats  database 
2 days ago
Book Banning in Arizona « Academe Blog
Is there a worse place in the U.S. than Arizona? I will never spend a dime there.
from twitter
2 days ago
Before and After Demonstration: Overview
The Before and After Demonstration is a multi-page resource that shows an inaccessible website and a retrofitted version of this same website. Each web page includes inline annotations that can be activated to highlight some of the key accessibility barriers or repairs. Each web page is also accompanied by an evaluation report to inform the developers on the level of conformance to the Web Content Accessibility Guidelines (WCAG).
accessibility  interface  design  standards 
2 days ago
Digital humanities
Humanities students often do not realize (or even imagine) that 1) they are capable of learning to write useful and practical computer programs within the course of a semester even if they have no prior background in programming; 2) the ability to write one’s own programs can be valuable for scholars in the humanities, especially because commercial software often does not address research needs in the humanities; and 3) practical computer programming, no less than reading, writing, and arithmetic, is a useful skill that is within the reach of any educated person regardless of academic specialization.

This course will introduce students to the role that computational methods can play in primary research and scholarship in the humanities, using as a technological framework eXtensible Markup Language (XML) and related technologies.
digitalhumanities  syllabus  xml 
2 days ago
How to bypass firewalls or captive portals with dns2tcp | fosk.it! 2.0
Classic wireless hot spots commonly allow two protocols: ICMP and DNS (UDP/53). ICMP (Internet Control Message Protocol) is used to report errors and warning to the client and DNS is mandatory to resolve hostnames. While ICMP can also be used as a transport protocol (see PTunnel), firewalls may block unusual ICMP packets (ex: suspicious big packets). On the other side, there are often less restriction regarding DNS traffic.
internet  howto 
4 days ago
Fuchs
"the role of surveillance in Google’s form of capital accumulation is explained"
from twitter
4 days ago
Historical Controversies Now
Instead of going to the library or the archive, we increasingly access history, the past, through the web. But what kind of history or histories, past or pasts are we accessing online? And what does this accessing entail? Following Leong et al., we approach temporality on the web “as a multiplicity of times derived from relations between different elements (2009, 1279)." This project is specifically focused on contentious historical moments, pasts that have had and potentially still have a major emotional impact, and which have been subject of struggle. Moreover, we not interested in sites specifically devoted to history, but in the major platforms on the web.

Confronting the historical events on the various platforms and opening up to a multiplicity of time we immediately realized that the traditional linear conception of time does not work online. First, most platforms do no not work in a chronological fashion, but with a reverse chronology. Second, because the platforms order sources according to ‘relevance’, the chronology of the sources as they are presented to us is radically mixed up. Third, sources do their own trick with time as well. Some focus on the historical event itself, while other rework the event. This reworking happens in a wide variety of ways, for example, by metaphorically invoking the event, by turning it into a historiographic debate, or by incorporating the event in a personal account (reading a history book, visiting a historical site, listening to a song). Crucially, in some of these reworkings, the event is actualized as controversial. These temporal complications directly informed our research, analysis, and visualization.

The above considerations translate in the following research questions:

Source time: Do we primarily find contemporary sources or historical sources in the various spheres? Does this vary across controversies?

Historical time: Do the sources on a platform focus on the historical moment itself, or a contemporary reworking of the moment? Does this vary across controversies?

Heat of the controversy: Is the controversy treated as settled, or is it actualized as still controversial? Does this vary across platforms and controversies?
history  datamining  web  publichistory 
4 days ago
API Ecology
The concept of the mashup implies the combination of different data sources or functions. Practically, this often means that a mashup makes use of several APIs and tries to produce new insights or new functionalities by mixing them together. The patterns of combination are not random: one can imagine that certain APIs are brought together more often than others. This (short) project proposes to examine this dynamic more closely.
webservices  api  networks  infoviz 
4 days ago
yellow garlic - Google Search
If I accomplish nothing else in life, at least I can say that I am the Web's authority on yellow garlic.
from twitter
6 days ago
Diabetes Daily - Search Results
An interesting approach to forum search:
from twitter
7 days ago
Thoms, William John (DNB00) - Wikisource
In 1849 he resumed his project of providing a paper ‘in which literary men could answer one another's questions.’ Dilke encouraged him, with the result that the first number of ‘Notes and Queries’ appeared on 3 Nov. 1849. The name was chosen by Thoms, and he selected for a motto Captain Cuttle's phrase, ‘When found, make a note of.’ In form the journal was modelled on the ‘Somerset House Gazette.’
scholarlycommunication  scholarship  history  editorsnotes 
7 days ago
MURK AVENUE, I FOUND ICE CUBES 'GOOD DAY'
How to do historical research: I FOUND ICE CUBES ‘GOOD DAY’
from twitter
7 days ago
Telomere - Wikipedia, the free encyclopedia
Telomeres are repetitive nucleotide sequences located at the termini of linear chromosomes. Mons and Velterop refer to concepts linked by a predicate in an RDF triple as "telomeric concepts," an interesting metaphor demonstrating that Otlet's dream of a science of documentation that mirrors the science of natural phenomena is alive and well.
documentation  semweb  science  scholarlycommunication 
7 days ago
Dombey and son
"When found, make a note of." – Captain Cuttle
from twitter
7 days ago
Oxford Journals | Humanities | Notes and Queries
Founded under the editorship of the antiquary W J Thoms, the primary intention of Notes and Queries was, and still remains, the asking and answering of readers' questions. It is devoted principally to English language and literature, lexicography, history, and scholarly antiquarianism.
history  language  literature  editorsnotes  scholarlycommunication  scholarship 
7 days ago
Notes and Queries (Bookshelf) - Gutenberg
Notes and Queries (originally subtitled "a medium of inter-communication for literary men, artists, antiquaries, genealogists, etc") is a London-based, quarterly publication, part academic journal, part correspondence magazine, in which scholars and interested amateurs can exchange knowledge on literature and history.
editorsnotes  scholarship  scholarlycommunication  history  literature 
7 days ago
Mocha - the fun, simple, flexible JavaScript test framework
Mocha is a feature-rich JavaScript test framework running on node and the browser, making asynchronous testing simple and fun. Mocha tests run serially, allowing for flexible and accurate reporting, while mapping uncaught exceptions to the correct test cases.
nodejs  javascript  testing  qa 
9 days ago
A New Part of Your Digital Humanities Toolkit | Tapas Project
Tapas is the TEI Archival Publishing and Access Service for scholars and other creators of TEI data who need a place to publish their materials in different forms and ensure it remains accessible over time. Tapas is also for anyone interested in reading and exploring TEI data, and communicating with those that share that interest.
tei  publishing  digitalhumanities 
9 days ago
mhevery/jasmine-node - GitHub
Write the specifications for your code in *.js and *.coffee files in the spec/ directory (note: your specification files must end with either .spec.js or .spec.coffee; otherwise jasmine-node won't find them!). You can use sub-directories to better organise your specs.
javascript  nodejs  testing  qa  coffeescript 
10 days ago
nytd/ice - GitHub
Ice is a track changes implementation, built in javascript, for anything that is contenteditable on the web.
editing  interface  versioning  javascript  html 
11 days ago
splitta - statistical sentence boundary detection
Sentence tokenizer written in python. Includes proper tokenization and models for very high accuracy sentence boundary detection (English only for now). The models are trained from Wall Street Journal news combined with the Brown Corpus which is intended to be widely representative of written English. Error rates on test news data are near 0.25%.
nlp  python 
13 days ago
Paul A Lombardo - Legal Archaeology: Recovering the Stories behind the Cases
Every lawsuit is a potential drama: a story of conflict, often with victims and villains, leading to justice done or denied. Yet a great deal, if not all, that we learn about the most noteworthy of lawsuits — the truly great cases — comes from reading the opinion of an appellate court, written by a judge who never saw the parties of the case, who worked at a time and a place far removed from the events that gave rise to litigation. We focus on “the facts of the case,” as described in a judge’s opinion, and then we describe the way the court applied the law to such facts as doctrine, hardly pausing to note the irony of this ex cathedra image, smacking of infallibility. Rarely do we admit that the official factual account contained in an appellate opinion may have only the most tenuous relationship to the events that actually led the parties to court. The complex stories — turning on small facts, seemingly trivial circumstances, and inter-contingent events — fade away as the “case” takes on a life of its own as it leaves the court of appeals.
law  narrative  history  facts  archives  archaeology  health 
13 days ago
The Little Book on CoffeeScript
CoffeeScript is a little language that compiles down to JavaScript. The syntax is inspired by Ruby and Python, and implements many features from those two languages. This book is designed to help you learn CoffeeScript, understand best practices and start building awesome client side applications. The book is little, only five chapters, but that's rather apt as CoffeeScript is a little language too.
coffeescript  javascript  reference 
14 days ago
tmpvar/jsdom - GitHub
A javascript implementation of the W3C DOM.
dom  javascript  nodejs  jquery  scraping 
14 days ago
ARL Report on Digital Humanities
Washington DC--The Association of Research Libraries (ARL) has published Digital Humanities, SPEC Kit 326, which provides a snapshot of research library experiences with digital scholarship centers or services that support the humanities (e.g., history, art, music, film, literature, philosophy, religion, etc.) and the benefits and challenges of hosting them. The survey asked ARL libraries about the organization of these services, how they are staffed and funded, what services they offer and to whom, what technical infrastructure is provided, whether the library manages or archives the digital resources produced, and how services are assessed, among other questions.

This survey revealed that library-based support for the digital humanities is offered predominantly on an ad hoc basis. However, as demand for services supporting the digital humanities has grown, libraries have begun to re-evaluate their provisional service and staffing models. Many respondents expressed a desire to implement practices, policies, and procedures that would allow them to cope with increases in demand for services.

This SPEC Kit includes documentation from respondents that describes the mission or purpose of digital humanities centers, the services offered, policies and procedures, examples of digital projects, fellowship and grant opportunities, promotional materials, and repositories for digital projects.
digitalhumanities  research  libraries 
15 days ago
Tableau Public | Tableau Software
Tableau Public is a free service that lets you create and share data visualizations on the web. Thousands use it to share data on websites and blogs and through social media like Facebook and Twitter. Tableau Public allows you to see data efficiently and powerfully without any programming.
visualization  infoviz 
15 days ago
t.co / Twitter
RT : It's Republicans, not Democrats, who are responsible for and seeming to go down in flames. ...
PIPA  SOPA  from twitter
16 days ago
The Problem of the Yellow Milkmaid: A Business Model Perspective on Open Metadata
"The Milkmaid," one of Johannes Vermeer's most famous pieces, depicts a scene of a woman quietly pouring milk into a bowl. During a survey the Rijksmuseum discovered that there were over 10,000 copies of the image on the internet—mostly poor, yellowish reproductions. As a result of all of these low-quality copies on the web, according to the Rijksmuseum, "people simply didn't believe the postcards in our museum shop were showing the original painting. This was the trigger for us to put high-resolution images of the original work with open metadata on the web ourselves. Opening up our data is our best defence against the 'yellow Milkmaid.'"
metadata  business  art  museum 
16 days ago
Data Clustering Software | Karypis Lab
CLUTO is a software package for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters. CLUTO is well-suited for clustering data sets arising in many diverse application areas including information retrieval, customer purchasing transactions, web, GIS, science, and biology.
clustering  datamining 
16 days ago
Chris Dodd’s paid SOPA crusading - Salon.com
RT : Former Senator, now CEO Chris Dodd goes back on his promise not to become a lobbyist to push :
MPAA  SOPA  from twitter
16 days ago
Diction Software - Home
Diction 6.0 uses dictionaries (word-lists) to search a text for these qualities:

· Certainty - Language indicating resoluteness, inflexibility, and completeness and a tendency to speak ex cathedra.

· Activity - Language featuring movement, change, the implementation of ideas and the avoidance of inertia.

· Optimism - Language endorsing some person, group, concept or event, or highlighting their positive entailments.

· Realism - Language describing tangible, immediate, recognizable matters that affect people's everyday lives.

· Commonality - Language highlighting the agreed-upon values of a group and rejecting idiosyncratic modes of engagement.
textanalysis  sentiment  digitalhumanities 
16 days ago
Statistics 110: Introduction to Probability
Statistics 110 (Introduction to Probability), taught at Harvard University by Joe Blitzstein in Fall 2011. Lecture videos, homework, review material, practice exams, and a large collection of practice problems with detailed solutions are provided. This course is an introduction to probability as a language and set of tools for understanding statistics, science, risk, and randomness. The ideas and methods are useful in statistics, science, philosophy, engineering, economics, finance, and everyday life. Topics include the following. Basics: sample spaces and events, conditional probability, Bayes’ Theorem. Random variables and their distributions: cumulative distribution functions, moment generating functions, expectation, variance, covariance, correlation, conditional expectation. Univariate distributions: Normal, t, Binomial, Negative Binomial, Poisson, Beta, Gamma. Multivariate distributions: joint, conditional, and marginal distributions, independence, transformations, Multinomial, Multivariate Normal. Limit theorems: law of large numbers, central limit theorem. Markov chains: transition probabilities, stationary distributions, reversibility, convergence.
statistics  education 
16 days ago
Blacksmith
A static site generator built with Node.js, JSDOM, and Weld.
nodejs  web  tools  blog 
17 days ago
Definition of User Agent - WAI UA Wiki
A user agent is any software that retrieves and presents Web content for end users or is implemented using Web technologies. User agents include Web browsers, media players, and plug-ins that help in retrieving, rendering and interacting with Web content. The family of user agents also includes operating system shells, consumer electronics with Web-widgets, and stand-alone applications or embedded applications whose user interface is implemented as a combination of Web technologies.
webinfo  definitions 
17 days ago
Network Protocol Headers
Nice diagrams of various internet protocol headers.
internet  networking  webinfo 
17 days ago
Kanso
Kanso can be described as the NPM for CouchApps, with tools for installing and publishing shared packages while managing dependencies. The Kanso community provides reusable build-tools, modules, templates and more via the online repository. Kanso's built around a powerful packaging system, meaning almost all the functionality can be customized by you.
couchdb  javascript 
21 days ago
Discovering the Template | Easily Distracted
I can see that another thing I often do in my courses, particularly thematic classes, is provide a “spine” narrative that supports the discussion. For all that I think “coverage” is an uninteresting objective for a class, I clearly recognize that without some core storyline or knowledge base, a class would be nothing but 14 weeks of “another interesting reading”: fun and diverting, but not giving students any sense of cumulative ownership over the subject, a sense that they know something that can be brought to bear in unexpected and creative ways on later readings (and on later experiences once the class is over).
narrative  education  history 
22 days ago
Augmenting Human Intellect: A Conceptual Framework - 1962 (AUGMENT,3906,) - Doug Engelbart Institute
By "augmenting human intellect" we mean increasing the capability of a man to approach a complex problem situation, to gain comprehension to suit his particular needs, and to derive solutions to problems. Increased capability in this respect is taken to mean a mixture of the following: more-rapid comprehension, better comprehension, the possibility of gaining a useful degree of comprehension in a situation that previously was too complex, speedier solutions, better solutions, and the possibility of finding solutions to problems that before seemed insoluble. And by "complex situations" we include the professional problems of diplomats, executives, social scientists, life scientists, physical scientists, attorneys, designers--whether the problem situation exists for twenty minutes or twenty years. We do not speak of isolated clever tricks that help in particular situations. We refer to a way of life in an integrated domain where hunches, cut-and-try, intangibles, and the human "feel for a situation" usefully co-exist with powerful concepts, streamlined terminology and notation, sophisticated methods, and high-powered electronic aids.
hci  hypertext  webhistory 
23 days ago
One Book, Many Readings
Visualizations of the structures of Choose Your Own Adventure Books.
hypertext  infoviz  design 
23 days ago
Jakib Nielsen - Hypertext '87
Hypertext '87 was the first large-scale meeting devoted to the hypertext concept. Before the workshop, hypertext had been considered a somewhat esoteric concept of interest to a few fanatics only.
hypertext 
23 days ago
Emanuel Goldberg, Electronic Document Retrieval, And Vannevar Bush's Memex
Vannevar Bush's famous paper "As We May Think" (1945) described an imaginary information retrieval machine, the Memex. The Memex is usually viewed, unhistorically, in relation to subsequent developments using digital computers. This paper attempts to reconstruct the little-known background of information retrieval in and before 1939 when "As We May Think" was originally written. The Memex was based on Bush's work during 1938-1940 developing an improved photoelectric microfilm selector, an electronic retrieval technology pioneered by Emanuel Goldberg of Zeiss Ikon, Dresden, in the 1920s. Visionary statements by Paul Otlet (1934) and Walter Schuermeyer (1935) and the development of electronic document retrieval technology before Bush are examined.
goldberg  webhistory  webinfo  memex  searchengine  history 
23 days ago
Michael Buckland's Wilhelm Ostwald Page
Michael Buckland's notes on Wilhelm Ostwald.

"Ostwald discussed problems of information management with Paul Otlet, co-founder of the International Institute for Bibliography in Brussels, in 1910. He used most of his Nobel Prize money to finance a similar organization, Die Bruecke ('The Bridge'), an 'international institute for the organizing of intellectual work,' which he founded in Munich with Karl Wilhelm Buehrer and Adolf Saager in June 1911.   The manifesto of the The Bridge, entitled, the 'The Organizing of Intellectual Work' was published in German and in Esperanto ('everybody's second language') in 1911."

"They advocated 'the monographic principle' (hypertext), technical standards, the use of the Universal Decimal Classification, and the idea of a World Brain. The Bridge ended in 1913 after publishing numerous pamphlets. Ostwald died in 1932. One lasting legacy of his work is the international standard for paper sizes (A4 etc.)."
history  information  ostwald 
23 days ago
Alle Kennis van de Wereld (Biography of Paul Otlet)
A free documentary about Paul Otlet, narrated by W. Boyd Rayward, his biographer.
otlet  biography  documentary  video 
23 days ago
True Films: The Man Who Wanted to Classify the World
Kevin Kelly's notes on _The Man Who Wanted to Classify the World_, a French documentary on Paul Otlet.
otlet  history  documentary  webhistory  webinfo 
23 days ago
The Mundaneum Museum Honors the First Concept of the World Wide Web
NYT article on Paul Otlet, with an excellent graphic explaining the Mundaneum system, and a video excerpt from the documentary on him.
webhistory  webinfo  otlet  history  information  technology 
23 days ago
Michael Buckland's Emanuel Goldberg Page
Michael Buckland's notes on Emanuel Goldberg, with links to other resources.

"Emanuel Goldberg (Portrait) was born in Moscow, Russia, in 1881, a chemist, inventor, and industrialist who contributed to almost all aspects of imaging technology in the first half of the twentieth century: photographic sensitometry, reprographics, standardized film speeds, color printing (moiré effect), aerial photography, extreme microphotography (microdots), optics, camera design (the Contax), the important, early hand-held Kinamo movie camera, and early television technology. He received his doctorate from Wilhelm Ostwald's institute in Leipzig in 1906."
goldberg  webhistory  history  film  microfilm  searchengine 
23 days ago
Michael Buckland's Paul Otlet Page
Michael Buckland's notes on Paul Otlet, with links to other Otlet resources.

"Paul Otlet (portrait) was born in Brussels, Belgium, in 1868. His monumental book Traité de documentation. (Brussels, 1934) was both central and symbolic in the development of information science - then called 'Documentation' - in the first half of this century. In addition, it reminds us of something that has been too widely forgotten: That this field did have a lively existence in the early decades of this century and a sophistication concerning theory and information technology that now commonly surprises people."
webhistory  webinfo  otlet  cataloging  classification  history  hypertext  libraries 
23 days ago
Ian Bogost - This is a Blog Post about the Digital Humanities
Last sentence here nails the State of Digital Humanities in 2012:
from twitter
23 days ago
Learn by Doing - Code School
Code School is all about learning by doing. Our educational courses combine video, coding in the browser, and gamification principles to make learning more fun and therefore more effective.
programming  education  tutorials 
24 days ago
Learn to code | Codecademy
Codecademy is the easiest way to learn how to code.
programming  tutorial  education  webinfo 
24 days ago
RDF Cookbook for Digital Humanities
The purpose of this cookbook is to document and discuss the use of RDF in digital humanities. Its focus is specific applications as found in the real world, though a few general principles are suggested. It assumes that you’re vaguely comfortable with RDF and RDFa.
rdf  rdfa  linkeddata  digitalhumanities 
25 days ago
Simple JavaScript Applications with CouchDB - CouchApp.org
CouchApps are JavaScript and HTML5 applications served directly from CouchDB. If you can fit your application into those constraints, then you get CouchDB's scalability and flexibility "for free" (and deploying your app is as simple as replicating it to the production server).
couchdb  html5  javascript  webinfo 
25 days ago
Pannapacker at MLA: Alt-Ac Is the Future of the Academy - Brainstorm - The Chronicle of Higher Education
While I fully support the notion of "alt-ac" for humanities PhDs, I hope that it doesn't mislead students into thinki…
from twitter
25 days ago
The Meaning and The Mining of Legal Texts
Positive law, inscribed in legal texts, entails an authority not inherent in literary texts, generating legal consequences that can have real effects on a person’s life and liberty. The interpretation of legal texts, necessarily a normative undertaking, resists the mechanical application of rules, though still requiring a measure of predictability, coherence with other relevant legal norms and compliance with constitutional safeguards. The present proliferation of legal texts on the internet (codes, statutes, judgments, treaties, doctrinal treatises) renders the selection of relevant texts and cases next to impossible. We may expect that systems to mine these texts to find arguments that support one’s case, as well as expert systems that support the decision-making process of courts, will end up doing much of the work.

This raises the question of the difference between human interpretation and computational pattern-recognition and the issue of whether this difference makes a difference for the meaning of law. Possibly, data mining will produce patterns that disclose habits of the minds of judges and legislators that would have otherwise gone unnoticed (reinforcing the argument of the ‘legal realists’ at the beginning of the 20th century). Also, after the data analysis it will still be up to the judge to decide how to interpret the results or up to the prosecution which patterns to engage in the construction of evidence (requiring a hermeneutics of computational patterns instead of texts). My focus in this paper regards the fact that the mining process necessarily disambiguates the legal texts in order to transform them into a machine-readable data set, while the algorithms used for the analysis embody a strategy that will co-determine the outcome of the patterns. There seems a major due process concern here to the extent that these patterns are invisible for the naked human eye and will not be contestable in a court of law, due to their hidden complexity and computational nature.

This position paper aims to explain what is at stake in the computational turn with regard to legal texts. This prepares for the question I want to put forward to those involved in distant reading and not-reading of texts: could a visualization of computational patterns constitute a new way of un-hiding the complexity involved, opening the results of computational ‘knowledge’ to citizens’ scrutiny?
textmining  machinelearning  visualization  digitalhumanities  law 
26 days ago
The Association of American Publishers
I just threw up in my mouth a little. : DAMMIT NO. MT : Vile: Research Works Act
from twitter
4 weeks ago
Untitled (http://lists.okfn.org/pipermail/open-bibliography/2012-January/001272.html)
RT : Obviously, hasn't understood yet: Can someone please scrape & publish the FAST data? /c ...
opendata  from twitter
4 weeks ago
« earlier      
1990 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 3d academia advertising ai ajax analysis anime annotation anthropology api architecture archives art audio authoring berkeley bibliography biography blog books business cinema classics classification club code collaboration comics commercial commons communication community computers conference copyright courses creative criticism css culture dance data database datamining delivery design development digital digitalhumanities distributed django documentary documentation documents economics editing education electronic events experimental fans fiction flash food foreignfilm framework future games google graphics hardware hiphop history howto html5 humor hypermedia hypertext ideas identity image indierock information infoviz inls520 interface internet ireland istanbul japan java javascript jazz journalism json language law library linkeddata literary literature locative machinelearning management maps marketing math media metadata methods mobile mp3 msmdx mthd multimedia music narrative neh2007 networking newmedia news nlp nodejs ontology opensource osx p2p participatory pdf people performingarts philosophy photography php policy politics post press psychology python quote rdf reference remix research rest rock science search semantics semiotics semweb sfbayarea singer social socialaspects socialscience sociology standards statistics strategy streaming subtitle svg syllabus syndication technology testing theory timetags tools travel turkey tv ubicomp uk Uncategorized unix unmediated urban usa usability video web webinfo webservices wiki windows wireless wishlist writing xml yahoo YRB

Copy this bookmark:



description:


tags: