jschneider + archive.org   91

Transferring “Libraries of Congress” of Data
"One ancillary task that arises from this arrangement is that the generated web archive data (roughly 5 terabytes per month) must be transferred from the West Coast to the Library of Congress. This turns out to be non-trivial; it may take the better part of a month with near-constant transfers over an Internet2 connection to move 10 terabytes of data. For all the optimism about transmitting “Libraries of Congress” of data over networks, putting data on physical storage media and then shipping that media around remains a surprisingly competitive alternative. Case in point: for all of the ethereality and technological sophistication implied by so-called cloud services, at least one of the major providers lets users upload their data in the comparatively mundane manner of mailing a hard drive."
archives  libraryofcongress  LC  archive.org  sneakernet 
october 2011 by jschneider
Scanning a Braille Playboy | Internet Archive Blogs
"The setup took longer than the majority of books do because these Braille issues are oversized, an odd color (kind of a paper bag consistency) and more like a newspaper stapled through than a standard binding. This was very informative, because the crew has a variety of tests and tools to make sure the scans are as good as they can be, including foam bracing, dowels, and shims. They tried a variety of approaches before settling on one.""Once the arrangement had been decided on, the calibration worked out, the lights adjusted and the process begun, it went very fast – this 98-page book was scanned in less than five minutes, and I only got a shot or two of the process. The Scribe system works very efficiently and someone trained with the system can work smoothly, with no damage or stress to the book or binding. Good thing, too – while the Playboy is only 19 years young, some of the books scanned have been around for centuries and wish to continue to do so.""I took back the Playboy and a few hours later, after a process of deriving the original scans into a whole host of convenient formats such as PDF, DjVu, and Epub, a Braille version of Playboy can now be seen on archive.org.

Now, I will be the very first to admit – the result is pretty silly. You’ve got something that needs to be read by touching it, which can’t be touched, and the two-sided indentations on the paper means it all looks pretty darn strange. So on one hand, it all can seem pretty useless.



But what can we learn by clicking on the link? Well, we find out that this sort of thing exists at all, and why, and what it looks like, and how Braille can be printed on both sides, and that it would take four copies to produce the text of a single issue… and that apparently, there’s no centerfold."
archive.org  braille  playboy 
august 2011 by jschneider
Shimenawa » Blog Archive » Books are not turbines (or paper towels)
"One of the most common responses I get from publishers when I tell them I want to acquire their books for the Archive (once I explain it adequately) is a nicely-put response that buying books is certainly an intriguing idea, but “we’re not really set up for sales like this, have you tried our asking our distributor?” In other words, handling individual sales is a very painful, high threshold task, and publishers only want to accommodate “high revenue” arrangements.""n engineering land, what this would imply is … wait for it … an API. An automated interface that would permit the purchase of a book by any party (human or code) in whatever quantity they wished, in whatever format they wished, as long as whatever arcane territorial restrictions and contract clauses did not override the desire of the reader to part with their money to help make the author wealthier, happier, and in a position to write more books.""publishers spend the majority of their time on filling the supply chain, customizing the requisite data flow with every business partner, instead of focusing their engineering on what they are actually selling, which are books and – more important – the experience the reader gets when they read the book. And that’s kinda insane."
books  archive.org  publishing  licensing  ebooks  API  EPUB  OPDS 
august 2011 by jschneider
1936 book predicts the impact of the web on research and discourse - Boing Boing
"this is weird: I was *there* when this happened, and now it's reported in BoingBoing:"

via http://twitter.com/#!/pabinkley/status/76685438240309249
see also http://www.archive.org/details/BrewsterKahleReadsR.c.Binkley
video  archive.org  futurism  Davis  Weinberger 
june 2011 by jschneider
Go To Hellman: Internet Archive Sets Fair-Use Bait With Open Library Lending
"The fact that at least one author was asked for permission suggests that the Archive is being very careful about what it chooses to make available through the lending program. A look at the 187 items in the lending library supports this view. There are
Works by well-known copyright reform advocates such as Brand and Lawrence Lessig.
Obsolete computer books. Example: Microsoft Windows 98 at a glance
Older books likely to be orphan works: example- Alice James: her brothers~her journal, the posthumously published diary of Alice James.
An in-copyright version of an out-of-copyright work: Brace Lineage, a genealogical work apparently based on an older version.
Other genealogic works which might be considered to be data compilations. Example: Twelve generations in America
Spanish language works published in Guatemala. Example: Regimenes agrarios
A collection of stories by anonymous pregnant teens: You look too young to be a mom: teen moms speak out on love, learning, and success
In short, if you wanted to take legal action to stop the digital lending library, each of the books included in the lending library would pose some sort of problem for you.
"
archive.org  IP  digital-lending  digitization  copyright  fair-use  orphanworks 
august 2010 by jschneider
19th Century Novels : Free Books : Free Texts : Download & Streaming : Internet Archive
"illinois.gif

The "triple-decker" novel was a standard form of publishing for British fiction from the early 1800s until the 1890s. The market for this form of fiction was closely tied to commercial "circulating libraries," such as Mudie's and W. H. Smith. Unlike free public libraries, these circulating libraries charged patrons to borrow books, much like video rental stores do today. Publishing longer works of fiction was quite expensive, and by releasing them in multiple parts publishers captured an audience who eagerly awaited the next installment while proceeds from the first volumes paid for the printing of later volumes. Often sensational in subject matter, the genre was populated by heroines in danger, characters in disguise, potions and poisons. The University of Illinois Library is digitizing and making openly accessible via the Internet Archive its extensive collections of triple-decker novels. "
format  genre  collections  archive.org  UIUC  triple-decker 
april 2010 by jschneider
APIs
APIs to the Wayback machine
code4lib  archive.org  apis 
september 2009 by jschneider
THATCamp » Blog Archive
"Processing has turned out to be perfect for this. It’s not just good for cartoon faces and artistic and complex data visualizations (though it is excellent for those). It is well suited to bootstrapping little scraps of knowledge into quick cycles of gratifying incremental improvements. I ended up cobbling together a half-dozen relatively simple throwaway tools highly customized to the particular reading and indexing I wanted to do, minimizing keystrokes, maximizing what I could get from the imperfect information available to me, and efficiently recording what I wanted to record while scanning through the material. Having spent plenty of hours with the clicks, screeches, and blurs of microfilm readers, I can say that being able to fix up your own glorified (silent) virtual microfilm reader with random access is a wonderful thing. (It’s also nice that the images are never reversed because the person before you didn’t rewind to the proper spool.) And immensely better than PDF, too."
archive.org  processing 
june 2009 by jschneider
SDNY Judge Chin: Intervention Denied
SDNY Judge Chin: Intervention Denied Judge Chin's denial of motion for intervention by Internet Archive and other parties in the Authors Guild, et al. v. Google class action settlement proposal. View Document Info » You just added this document to your favorites. X Share This Document Copy and Paste URL X Send This Document Your Name (Required) Send to (Required, with one email address per line) Personal Message (Optional) or Cancel Or Import Email Addresses Choose your email provider: Transparent Transparent Transparent Username @ Password X Embed This Document 1. Copy the embed code below Advanced Options Embed Code Include link to document at top of embed Click for WordPress.com embed » Embed Code for WordPress.com 2. Paste the copied code into the source of the page where you want to embed this document Embed_screenshot Cancel X Download Document Download this document as: * Pdf_16x16 Adobe Acrobat (.pdf) * Txt2_16x16 Plain Text (.txt) Cancel X Flag This Docum
sad  :(  googlebooks  archive.org 
april 2009 by jschneider
A Tool to Verify Digital Records, Even as Technology Shifts - NYTimes.com
"Even the smallest change in the original document will result in a new hash value. " MAY, they should say--just makes forgery DIFFICULT, not impossible
digital-repservation  cryptography  hashing  long-now  archive.org  SHA-2 
january 2009 by jschneider
Slashdot | Collateral Damage as UK Censors Internet Archive
"An anonymous reader noted the latest developments in the controversial censoring of the internet by UK ISPs. Apparently since some content of the Wayback Machine is bad, the whole thing needs to be blacklisted. "
UK  archive.org  censorship 
january 2009 by jschneider
Dan Cohen’s Digital Humanities Blog » Blog Archive » Mass Digitization of Books: Exit Microsoft, What Next?
"The answer sounds like Abbott and Costello: the free program produces something that’s not free, while the expensive one does."
digitization  archive.org  OCA  Microsoft  Dan  Cohen 
may 2008 by jschneider
Internet Archive and just my timing « Rube Goldberg machines for libraries
"The thing that really got me was that I couldn’t reliably search by ISBNs. Their advanced search had an “isbn” field, but I couldn’t find an ISBN that could be found that way."
archive.org  screenscraping  Jason  Ronallo  Umlaut  XML  solr 
may 2008 by jschneider
GBS/OCLC « Bibliographic Wilderness
"I think it’s kind of sad that Worldcat is apparently going to include GBS digitized books before it includes Internet Archive hosted books (which there are still no public plans for)."
GoogleBooks  OCLC  Worldcat  archive.org  Jonathan  Rochkind 
may 2008 by jschneider
What I wish I could do with Internet Archive « Bibliographic Wilderness
"we really will write open source apps on top of the services you provide, but you’ve got to give us those services—creating our own index on top of your OAI-PMH feed"
archive.org  apis  openlibrary  screenscraping 
may 2008 by jschneider
Archive-It.org
"Internet Archive's subscription service, Archive-It, allows institutions to build and preserve their own web archive of born digital content, through a user friendly web application, without requiring any technical expertise or hosting facilities. Subscr
archive.org  webarchiving  digital  preservation 
march 2008 by jschneider
An experimental non-commercial project to archive and re-publish public domain works - PublicDomainReprints.org
"At this time, this service can take a book from any of the supported sites such as the the Internet Archive, Google Books or Universal Library (books in public domain ONLY) and reprint it using Lulu.com."
pod  archive.org  googlebooks  books  lulu  public-domain 
january 2008 by jschneider
A More Open Congress
"This is what libraries do best, and why they are special: they defend public interests; they make information widely available; and they ensure its permanence. These characteristics are notably distinct from the obligations and concerns characteristic of
govdocs  Congress  archive.org  GPO  information-as-public-good  Michigan  googlebooks  google-vs.-libraries 
december 2007 by jschneider
aDORe Archive - Overview
"The aDORe Archive is a write-once/read-many storage approach for Digital Objects and their constituent datastreams.""allows for the storage of mutliple XMLtapes and ARC files through the introduction of OAI-PMH compliant XMLtape and ARCfile registries."
OAI-PMH  LANL  OpenURL  archive.org  digital  preservation  datastreams  archiving 
december 2007 by jschneider
inkdroid » Blog Archive » permalinks reloaded
when generating a time anchored permalink with zotero one can well expect that archive.org will on occasion not have a snapshot of said content, resulting in a 404. It would be great if archive.org could leverage these requests for snapshots as requests t
permalink  zotero  archive.org 
december 2007 by jschneider
CrossTech: Zotero and the IA
"Zotero Commons is related to but different from Nature Precedings and WebCite in that it's intended focus is on public domain stuff on researchers hard drives rather than someone else's material or website that is cited (WebCite) or preprints, datasets,
zotero  archive.org 
december 2007 by jschneider
Dan Cohen’s Digital Humanities Blog » Blog Archive » Zotero and the Internet Archive Join Forces
"The Zotero-IA alliance will create a “Zotero Commons” into which scholarly materials can be added simply via the Zotero client. Almost every scholar and researcher has documents that they have scanned (some of which are in the public domain), finding
archive.org  zotero  OCR  annotations  digital  libraries  Dan  Cohen 
december 2007 by jschneider
OCA To Scan Orphan Works; Publishers Float Orphan Works Solution - 10/29/2007 - Library Journal
OCA libraries "in conjunction with the Internet Archive, would scan works that were out-of-print but in-copyright and pioneer "a digital interlibrary loan service" around them."
LJ  ebooks  OCA  ill  archive.org  copyright 
december 2007 by jschneider
Tales from the Open Content Alliance
"Kahle wants to implement ISBNs and really wouldn’t take any other answer, so the plan is to figure out how to make this work.... how will it effect the use of ISBNs in the meantime?" "It’s imperative that we come up with value on top this data we’r
microform  digitzation  oca  archive.org  Ross  Singer  PoD  ISBN  metadata  RDF  Brewster  Kahle  digitization 
october 2007 by jschneider
internet archive and nasa
"What's particularly exciting is that this is both an aggregation and a digitization project -- widespread materials will be brought together for easier discovery through get enriched metadata, and important materials will be selected and digitized to add
NASA  archive.org  digital  archiving 
september 2007 by jschneider
Schema (Open Library)
"Following is a python datastructure representing the field-schema for bibliographic items in ThingDB. Where the count attribute is not specified, its value is 'single'. The types string, text, url (and perhaps date) may all be stored as "strings" in Thin
python  Karen  Coyle  OpenLibrary  archive.org  datastructures  schemas  databases 
july 2007 by jschneider
What is reCAPTCHA?
"reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and us
Chronicle  games-with-a-purpose  CAPTCHA  digitization  archive.org  reCAPTCHA  OCR 
may 2007 by jschneider
« earlier      

related tags

****  :(  academia  access  Amazon  annotations  API  apis  archive.org  archives  archiving  BHL  binding  bit.ly  blogs  books  Boston  braille  branding  Brewster  c4l09  CAPTCHA  cataloging  censorship  Chronicle  CLOCKSS  code4lib  code4lib2008  Cohen  collaboration  collections  communication  computer  Congress  consortia  copyright  Cornell  Coyle  crawler  cryptography  CSS  Dan  data  databases  datastreams  datastructures  Davis  digital  digital-archiving  digital-lending  digital-repservation  digitization  digitzation  ebook  ebooks  Economist  electronic  Engard  EPUB  etexts  facebook  fair-use  findingaids  flash  flickr  floatdrop  format  francisco  future  futurism  games-with-a-purpose  genre  google  google-vs.-libraries  googlebooks  govdocs  GPO  guardian  harvesting  hashing  history  identity  ill  information-as-public-good  institutional  internet  internet-archive  IP  ISBN  Jason  java  Jonathan  Kahle  Karen  lace  LANL  LC  LCSH  libraries  library  libraryofcongress  licensing  linking  listserv-threads  listservs  LJ  LOCKSS  lodlam  long-now  lulu  metadata  Michigan  microform  Microsoft  mp3  MUD  Murray  music  NASA  NASIG  NDA  Nichole  OAI-PMH  oca  OCLC  OCR  Olympics  OPDS  open-library  openaccess  opendata  openlibrary  openlibrary.org  openness  opensource  OpenURL  orphanworks  outsourcing  PageRank  people  permalink  petabox  petabyte  Peter  photos  playboy  pod  postage  preservation  printbooks  privacy  processing  public-domain  public-libraries  publishing  python  RDF  reCAPTCHA  repositories  research  Rochkind  Ronallo  Ross  Roy  sad  san  scalability  scanning  schemas  Schneider  scholarly-communication  Scott  screenscraping  scribd  Scribe  secrecy  serials  serious-games  SHA-2  Singer  small-press  sneakernet  solr  spider  storage  Tennant  tinyurl  toread  Tornoto  triple-decker  tv  typography  UIUC  UK  Umlaut  url  url-shorteners  URLs  urlshorteners  video  videos  web  webarchiving  Weinberger  wildcards  Wired  Worldcat  XML  zotero 

Copy this bookmark:



description:


tags: