jschneider + archive.org 91
Transferring “Libraries of Congress” of Data
october 2011 by jschneider
"One ancillary task that arises from this arrangement is that the generated web archive data (roughly 5 terabytes per month) must be transferred from the West Coast to the Library of Congress. This turns out to be non-trivial; it may take the better part of a month with near-constant transfers over an Internet2 connection to move 10 terabytes of data. For all the optimism about transmitting “Libraries of Congress” of data over networks, putting data on physical storage media and then shipping that media around remains a surprisingly competitive alternative. Case in point: for all of the ethereality and technological sophistication implied by so-called cloud services, at least one of the major providers lets users upload their data in the comparatively mundane manner of mailing a hard drive."
archives
libraryofcongress
LC
archive.org
sneakernet
october 2011 by jschneider
Scanning a Braille Playboy | Internet Archive Blogs
august 2011 by jschneider
"The setup took longer than the majority of books do because these Braille issues are oversized, an odd color (kind of a paper bag consistency) and more like a newspaper stapled through than a standard binding. This was very informative, because the crew has a variety of tests and tools to make sure the scans are as good as they can be, including foam bracing, dowels, and shims. They tried a variety of approaches before settling on one.""Once the arrangement had been decided on, the calibration worked out, the lights adjusted and the process begun, it went very fast – this 98-page book was scanned in less than five minutes, and I only got a shot or two of the process. The Scribe system works very efficiently and someone trained with the system can work smoothly, with no damage or stress to the book or binding. Good thing, too – while the Playboy is only 19 years young, some of the books scanned have been around for centuries and wish to continue to do so.""I took back the Playboy and a few hours later, after a process of deriving the original scans into a whole host of convenient formats such as PDF, DjVu, and Epub, a Braille version of Playboy can now be seen on archive.org.
Now, I will be the very first to admit – the result is pretty silly. You’ve got something that needs to be read by touching it, which can’t be touched, and the two-sided indentations on the paper means it all looks pretty darn strange. So on one hand, it all can seem pretty useless.
But what can we learn by clicking on the link? Well, we find out that this sort of thing exists at all, and why, and what it looks like, and how Braille can be printed on both sides, and that it would take four copies to produce the text of a single issue… and that apparently, there’s no centerfold."
archive.org
braille
playboy
Now, I will be the very first to admit – the result is pretty silly. You’ve got something that needs to be read by touching it, which can’t be touched, and the two-sided indentations on the paper means it all looks pretty darn strange. So on one hand, it all can seem pretty useless.
But what can we learn by clicking on the link? Well, we find out that this sort of thing exists at all, and why, and what it looks like, and how Braille can be printed on both sides, and that it would take four copies to produce the text of a single issue… and that apparently, there’s no centerfold."
august 2011 by jschneider
Shimenawa » Blog Archive » Books are not turbines (or paper towels)
august 2011 by jschneider
"One of the most common responses I get from publishers when I tell them I want to acquire their books for the Archive (once I explain it adequately) is a nicely-put response that buying books is certainly an intriguing idea, but “we’re not really set up for sales like this, have you tried our asking our distributor?” In other words, handling individual sales is a very painful, high threshold task, and publishers only want to accommodate “high revenue” arrangements.""n engineering land, what this would imply is … wait for it … an API. An automated interface that would permit the purchase of a book by any party (human or code) in whatever quantity they wished, in whatever format they wished, as long as whatever arcane territorial restrictions and contract clauses did not override the desire of the reader to part with their money to help make the author wealthier, happier, and in a position to write more books.""publishers spend the majority of their time on filling the supply chain, customizing the requisite data flow with every business partner, instead of focusing their engineering on what they are actually selling, which are books and – more important – the experience the reader gets when they read the book. And that’s kinda insane."
books
archive.org
publishing
licensing
ebooks
API
EPUB
OPDS
august 2011 by jschneider
1936 book predicts the impact of the web on research and discourse - Boing Boing
june 2011 by jschneider
"this is weird: I was *there* when this happened, and now it's reported in BoingBoing:"
via http://twitter.com/#!/pabinkley/status/76685438240309249
see also http://www.archive.org/details/BrewsterKahleReadsR.c.Binkley
video
archive.org
futurism
Davis
Weinberger
via http://twitter.com/#!/pabinkley/status/76685438240309249
see also http://www.archive.org/details/BrewsterKahleReadsR.c.Binkley
june 2011 by jschneider
Go To Hellman: Internet Archive Sets Fair-Use Bait With Open Library Lending
august 2010 by jschneider
"The fact that at least one author was asked for permission suggests that the Archive is being very careful about what it chooses to make available through the lending program. A look at the 187 items in the lending library supports this view. There are
Works by well-known copyright reform advocates such as Brand and Lawrence Lessig.
Obsolete computer books. Example: Microsoft Windows 98 at a glance
Older books likely to be orphan works: example- Alice James: her brothers~her journal, the posthumously published diary of Alice James.
An in-copyright version of an out-of-copyright work: Brace Lineage, a genealogical work apparently based on an older version.
Other genealogic works which might be considered to be data compilations. Example: Twelve generations in America
Spanish language works published in Guatemala. Example: Regimenes agrarios
A collection of stories by anonymous pregnant teens: You look too young to be a mom: teen moms speak out on love, learning, and success
In short, if you wanted to take legal action to stop the digital lending library, each of the books included in the lending library would pose some sort of problem for you.
"
archive.org
IP
digital-lending
digitization
copyright
fair-use
orphanworks
Works by well-known copyright reform advocates such as Brand and Lawrence Lessig.
Obsolete computer books. Example: Microsoft Windows 98 at a glance
Older books likely to be orphan works: example- Alice James: her brothers~her journal, the posthumously published diary of Alice James.
An in-copyright version of an out-of-copyright work: Brace Lineage, a genealogical work apparently based on an older version.
Other genealogic works which might be considered to be data compilations. Example: Twelve generations in America
Spanish language works published in Guatemala. Example: Regimenes agrarios
A collection of stories by anonymous pregnant teens: You look too young to be a mom: teen moms speak out on love, learning, and success
In short, if you wanted to take legal action to stop the digital lending library, each of the books included in the lending library would pose some sort of problem for you.
"
august 2010 by jschneider
19th Century Novels : Free Books : Free Texts : Download & Streaming : Internet Archive
april 2010 by jschneider
"illinois.gif
The "triple-decker" novel was a standard form of publishing for British fiction from the early 1800s until the 1890s. The market for this form of fiction was closely tied to commercial "circulating libraries," such as Mudie's and W. H. Smith. Unlike free public libraries, these circulating libraries charged patrons to borrow books, much like video rental stores do today. Publishing longer works of fiction was quite expensive, and by releasing them in multiple parts publishers captured an audience who eagerly awaited the next installment while proceeds from the first volumes paid for the printing of later volumes. Often sensational in subject matter, the genre was populated by heroines in danger, characters in disguise, potions and poisons. The University of Illinois Library is digitizing and making openly accessible via the Internet Archive its extensive collections of triple-decker novels. "
format
genre
collections
archive.org
UIUC
triple-decker
The "triple-decker" novel was a standard form of publishing for British fiction from the early 1800s until the 1890s. The market for this form of fiction was closely tied to commercial "circulating libraries," such as Mudie's and W. H. Smith. Unlike free public libraries, these circulating libraries charged patrons to borrow books, much like video rental stores do today. Publishing longer works of fiction was quite expensive, and by releasing them in multiple parts publishers captured an audience who eagerly awaited the next installment while proceeds from the first volumes paid for the printing of later volumes. Often sensational in subject matter, the genre was populated by heroines in danger, characters in disguise, potions and poisons. The University of Illinois Library is digitizing and making openly accessible via the Internet Archive its extensive collections of triple-decker novels. "
april 2010 by jschneider
THATCamp » Blog Archive
june 2009 by jschneider
"Processing has turned out to be perfect for this. It’s not just good for cartoon faces and artistic and complex data visualizations (though it is excellent for those). It is well suited to bootstrapping little scraps of knowledge into quick cycles of gratifying incremental improvements. I ended up cobbling together a half-dozen relatively simple throwaway tools highly customized to the particular reading and indexing I wanted to do, minimizing keystrokes, maximizing what I could get from the imperfect information available to me, and efficiently recording what I wanted to record while scanning through the material. Having spent plenty of hours with the clicks, screeches, and blurs of microfilm readers, I can say that being able to fix up your own glorified (silent) virtual microfilm reader with random access is a wonderful thing. (It’s also nice that the images are never reversed because the person before you didn’t rewind to the proper spool.) And immensely better than PDF, too."
archive.org
processing
june 2009 by jschneider
SDNY Judge Chin: Intervention Denied
april 2009 by jschneider
SDNY Judge Chin: Intervention Denied Judge Chin's denial of motion for intervention by Internet Archive and other parties in the Authors Guild, et al. v. Google class action settlement proposal. View Document Info » You just added this document to your favorites. X Share This Document Copy and Paste URL X Send This Document Your Name (Required) Send to (Required, with one email address per line) Personal Message (Optional) or Cancel Or Import Email Addresses Choose your email provider: Transparent Transparent Transparent Username @ Password X Embed This Document 1. Copy the embed code below Advanced Options Embed Code Include link to document at top of embed Click for WordPress.com embed » Embed Code for WordPress.com 2. Paste the copied code into the source of the page where you want to embed this document Embed_screenshot Cancel X Download Document Download this document as: * Pdf_16x16 Adobe Acrobat (.pdf) * Txt2_16x16 Plain Text (.txt) Cancel X Flag This Docum
sad
:(
googlebooks
archive.org
april 2009 by jschneider
A Tool to Verify Digital Records, Even as Technology Shifts - NYTimes.com
january 2009 by jschneider
"Even the smallest change in the original document will result in a new hash value. " MAY, they should say--just makes forgery DIFFICULT, not impossible
digital-repservation
cryptography
hashing
long-now
archive.org
SHA-2
january 2009 by jschneider
Slashdot | Collateral Damage as UK Censors Internet Archive
january 2009 by jschneider
"An anonymous reader noted the latest developments in the controversial censoring of the internet by UK ISPs. Apparently since some content of the Wayback Machine is bad, the whole thing needs to be blacklisted. "
UK
archive.org
censorship
january 2009 by jschneider
Dan Cohen’s Digital Humanities Blog » Blog Archive » Mass Digitization of Books: Exit Microsoft, What Next?
may 2008 by jschneider
"The answer sounds like Abbott and Costello: the free program produces something that’s not free, while the expensive one does."
digitization
archive.org
OCA
Microsoft
Dan
Cohen
may 2008 by jschneider
Internet Archive and just my timing « Rube Goldberg machines for libraries
may 2008 by jschneider
"The thing that really got me was that I couldn’t reliably search by ISBNs. Their advanced search had an “isbn” field, but I couldn’t find an ISBN that could be found that way."
archive.org
screenscraping
Jason
Ronallo
Umlaut
XML
solr
may 2008 by jschneider
GBS/OCLC « Bibliographic Wilderness
may 2008 by jschneider
"I think it’s kind of sad that Worldcat is apparently going to include GBS digitized books before it includes Internet Archive hosted books (which there are still no public plans for)."
GoogleBooks
OCLC
Worldcat
archive.org
Jonathan
Rochkind
may 2008 by jschneider
What I wish I could do with Internet Archive « Bibliographic Wilderness
may 2008 by jschneider
"we really will write open source apps on top of the services you provide, but you’ve got to give us those services—creating our own index on top of your OAI-PMH feed"
archive.org
apis
openlibrary
screenscraping
may 2008 by jschneider
Archive-It.org
march 2008 by jschneider
"Internet Archive's subscription service, Archive-It, allows institutions to build and preserve their own web archive of born digital content, through a user friendly web application, without requiring any technical expertise or hosting facilities. Subscr
archive.org
webarchiving
digital
preservation
march 2008 by jschneider
An experimental non-commercial project to archive and re-publish public domain works - PublicDomainReprints.org
january 2008 by jschneider
"At this time, this service can take a book from any of the supported sites such as the the Internet Archive, Google Books or Universal Library (books in public domain ONLY) and reprint it using Lulu.com."
pod
archive.org
googlebooks
books
lulu
public-domain
january 2008 by jschneider
A More Open Congress
december 2007 by jschneider
"This is what libraries do best, and why they are special: they defend public interests; they make information widely available; and they ensure its permanence. These characteristics are notably distinct from the obligations and concerns characteristic of
govdocs
Congress
archive.org
GPO
information-as-public-good
Michigan
googlebooks
google-vs.-libraries
december 2007 by jschneider
aDORe Archive - Overview
december 2007 by jschneider
"The aDORe Archive is a write-once/read-many storage approach for Digital Objects and their constituent datastreams.""allows for the storage of mutliple XMLtapes and ARC files through the introduction of OAI-PMH compliant XMLtape and ARCfile registries."
OAI-PMH
LANL
OpenURL
archive.org
digital
preservation
datastreams
archiving
december 2007 by jschneider
inkdroid » Blog Archive » permalinks reloaded
december 2007 by jschneider
"there’s wildcarding the URL suffix, as in: http://web.archive.org/web/*/inkdroid.org/*" " there’s also wildcards in the date and the URL, as in: http://web.archive.org/web/*/http://www.google.com/"
archive.org
wildcards
URLs
zotero
december 2007 by jschneider
inkdroid » Blog Archive » permalinks reloaded
december 2007 by jschneider
when generating a time anchored permalink with zotero one can well expect that archive.org will on occasion not have a snapshot of said content, resulting in a 404. It would be great if archive.org could leverage these requests for snapshots as requests t
permalink
zotero
archive.org
december 2007 by jschneider
CrossTech: Zotero and the IA
december 2007 by jschneider
"Zotero Commons is related to but different from Nature Precedings and WebCite in that it's intended focus is on public domain stuff on researchers hard drives rather than someone else's material or website that is cited (WebCite) or preprints, datasets,
zotero
archive.org
december 2007 by jschneider
Dan Cohen’s Digital Humanities Blog » Blog Archive » Zotero and the Internet Archive Join Forces
december 2007 by jschneider
"The Zotero-IA alliance will create a “Zotero Commons” into which scholarly materials can be added simply via the Zotero client. Almost every scholar and researcher has documents that they have scanned (some of which are in the public domain), finding
archive.org
zotero
OCR
annotations
digital
libraries
Dan
Cohen
december 2007 by jschneider
OCA To Scan Orphan Works; Publishers Float Orphan Works Solution - 10/29/2007 - Library Journal
december 2007 by jschneider
OCA libraries "in conjunction with the Internet Archive, would scan works that were out-of-print but in-copyright and pioneer "a digital interlibrary loan service" around them."
LJ
ebooks
OCA
ill
archive.org
copyright
december 2007 by jschneider
blog.mignault.net » Blog Archive » Wrighting the rong
october 2007 by jschneider
Changing link patterns
archive.org
linking
Amazon
books
october 2007 by jschneider
Tales from the Open Content Alliance
october 2007 by jschneider
"Kahle wants to implement ISBNs and really wouldn’t take any other answer, so the plan is to figure out how to make this work.... how will it effect the use of ISBNs in the meantime?" "It’s imperative that we come up with value on top this data we’r
microform
digitzation
oca
archive.org
Ross
Singer
PoD
ISBN
metadata
RDF
Brewster
Kahle
digitization
october 2007 by jschneider
internet archive and nasa
september 2007 by jschneider
"What's particularly exciting is that this is both an aggregation and a digitization project -- widespread materials will be brought together for easier discovery through get enriched metadata, and important materials will be selected and digitized to add
NASA
archive.org
digital
archiving
september 2007 by jschneider
Schema (Open Library)
july 2007 by jschneider
"Following is a python datastructure representing the field-schema for bibliographic items in ThingDB. Where the count attribute is not specified, its value is 'single'. The types string, text, url (and perhaps date) may all be stored as "strings" in Thin
python
Karen
Coyle
OpenLibrary
archive.org
datastructures
schemas
databases
july 2007 by jschneider
Free Range Librarian » Blog Archive » NASIG 2007 Presentation: State of Emergency
june 2007 by jschneider
"long slow boiling of the frog" serials as "memory work"
NASIG
serials
Karen
Schneider
outsourcing
google
licensing
digitization
scanning
NDA
secrecy
openness
communication
archive.org
LOCKSS
CLOCKSS
electronic
preservation
postage
small-press
june 2007 by jschneider
What is reCAPTCHA?
may 2007 by jschneider
"reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and us
Chronicle
games-with-a-purpose
CAPTCHA
digitization
archive.org
reCAPTCHA
OCR
may 2007 by jschneider
related tags
**** ⊕ :( ⊕ academia ⊕ access ⊕ Amazon ⊕ annotations ⊕ API ⊕ apis ⊕ archive.org ⊖ archives ⊕ archiving ⊕ BHL ⊕ binding ⊕ bit.ly ⊕ blogs ⊕ books ⊕ Boston ⊕ braille ⊕ branding ⊕ Brewster ⊕ c4l09 ⊕ CAPTCHA ⊕ cataloging ⊕ censorship ⊕ Chronicle ⊕ CLOCKSS ⊕ code4lib ⊕ code4lib2008 ⊕ Cohen ⊕ collaboration ⊕ collections ⊕ communication ⊕ computer ⊕ Congress ⊕ consortia ⊕ copyright ⊕ Cornell ⊕ Coyle ⊕ crawler ⊕ cryptography ⊕ CSS ⊕ Dan ⊕ data ⊕ databases ⊕ datastreams ⊕ datastructures ⊕ Davis ⊕ digital ⊕ digital-archiving ⊕ digital-lending ⊕ digital-repservation ⊕ digitization ⊕ digitzation ⊕ ebook ⊕ ebooks ⊕ Economist ⊕ electronic ⊕ Engard ⊕ EPUB ⊕ etexts ⊕ facebook ⊕ fair-use ⊕ findingaids ⊕ flash ⊕ flickr ⊕ floatdrop ⊕ format ⊕ francisco ⊕ future ⊕ futurism ⊕ games-with-a-purpose ⊕ genre ⊕ google ⊕ google-vs.-libraries ⊕ googlebooks ⊕ govdocs ⊕ GPO ⊕ guardian ⊕ harvesting ⊕ hashing ⊕ history ⊕ identity ⊕ ill ⊕ information-as-public-good ⊕ institutional ⊕ internet ⊕ internet-archive ⊕ IP ⊕ ISBN ⊕ Jason ⊕ java ⊕ Jonathan ⊕ Kahle ⊕ Karen ⊕ lace ⊕ LANL ⊕ LC ⊕ LCSH ⊕ libraries ⊕ library ⊕ libraryofcongress ⊕ licensing ⊕ linking ⊕ listserv-threads ⊕ listservs ⊕ LJ ⊕ LOCKSS ⊕ lodlam ⊕ long-now ⊕ lulu ⊕ metadata ⊕ Michigan ⊕ microform ⊕ Microsoft ⊕ mp3 ⊕ MUD ⊕ Murray ⊕ music ⊕ NASA ⊕ NASIG ⊕ NDA ⊕ Nichole ⊕ OAI-PMH ⊕ oca ⊕ OCLC ⊕ OCR ⊕ Olympics ⊕ OPDS ⊕ open-library ⊕ openaccess ⊕ opendata ⊕ openlibrary ⊕ openlibrary.org ⊕ openness ⊕ opensource ⊕ OpenURL ⊕ orphanworks ⊕ outsourcing ⊕ PageRank ⊕ people ⊕ permalink ⊕ petabox ⊕ petabyte ⊕ Peter ⊕ photos ⊕ playboy ⊕ pod ⊕ postage ⊕ preservation ⊕ printbooks ⊕ privacy ⊕ processing ⊕ public-domain ⊕ public-libraries ⊕ publishing ⊕ python ⊕ RDF ⊕ reCAPTCHA ⊕ repositories ⊕ research ⊕ Rochkind ⊕ Ronallo ⊕ Ross ⊕ Roy ⊕ sad ⊕ san ⊕ scalability ⊕ scanning ⊕ schemas ⊕ Schneider ⊕ scholarly-communication ⊕ Scott ⊕ screenscraping ⊕ scribd ⊕ Scribe ⊕ secrecy ⊕ serials ⊕ serious-games ⊕ SHA-2 ⊕ Singer ⊕ small-press ⊕ sneakernet ⊕ solr ⊕ spider ⊕ storage ⊕ Tennant ⊕ tinyurl ⊕ toread ⊕ Tornoto ⊕ triple-decker ⊕ tv ⊕ typography ⊕ UIUC ⊕ UK ⊕ Umlaut ⊕ url ⊕ url-shorteners ⊕ URLs ⊕ urlshorteners ⊕ video ⊕ videos ⊕ web ⊕ webarchiving ⊕ Weinberger ⊕ wildcards ⊕ Wired ⊕ Worldcat ⊕ XML ⊕ zotero ⊕Copy this bookmark: