jschneider + googlebooks 318
Google Books: Ratio of Inked Space to Blank Space
25 days ago by jschneider
"What if Google Books were to publish the ratio of inked to non-inked space for all of the items it has scanned? We could then see how writing of different types, for example, plays or prose fiction, move into larger print formats such as the Folio."
googlebooks
page-image
ink
print-formats
25 days ago by jschneider
User Policies - Books Help
july 2011 by jschneider
"You may not lend or co-own any of your Google eBooks purchases with another person."
ebooks
googlebooks
sharing
july 2011 by jschneider
Shimenawa » Blog Archive » GBS: Settle or Litigate?
july 2011 by jschneider
"As James Grimmelmann noted at The Laboratorium, Chin also suggested that if settlement talks do not reach fruition and there was a return to litigation, the path would be clearly lit:
Judge Chin suggested that he saw the case, if it were to be litigated, in terms of fairly straightforward cross motions for summary judgment on whether snippet display is a fair use.
""It seems to me that the only benefit Google obtains from a new settlement is clean hands over the past claims of infringement for digitization, but if the only operation they conduct is snippet-view, there is not necessarily a requirement for all-party approval. One could well argue from Google’s perspective that they actually don’t want to establish a precedent for asking permission for a broad class of activities that have been held as Fair Use when they have been litigated. Regardless, the barrier of final class certification still resides in the settlement house."
googlebooks
copyright
fairuse
Judge Chin suggested that he saw the case, if it were to be litigated, in terms of fairly straightforward cross motions for summary judgment on whether snippet display is a fair use.
""It seems to me that the only benefit Google obtains from a new settlement is clean hands over the past claims of infringement for digitization, but if the only operation they conduct is snippet-view, there is not necessarily a requirement for all-party approval. One could well argue from Google’s perspective that they actually don’t want to establish a precedent for asking permission for a broad class of activities that have been held as Fair Use when they have been litigated. Regardless, the barrier of final class certification still resides in the settlement house."
july 2011 by jschneider
Google & the Future of Books by Robert Darnton | The New York Review of Books
july 2011 by jschneider
"I especially enjoy the exchange of letters between Jefferson and Madison. They discussed everything, notably the American Constitution, which Madison was helping to write in Philadelphia while Jefferson was representing the new republic in Paris. They often wrote about books, for Jefferson loved to haunt the bookshops in the capital of the Republic of Letters, and he frequently bought books for his friend. The purchases included Diderot’s Encyclopédie, which Jefferson thought that he had got at a bargain price, although he had mistaken a reprint for a first edition.""If we turned the sociology of knowledge onto the present—as Bourdieu himself did—we would see that we live in a world designed by Mickey Mouse, red in tooth and claw.""But we, too, cannot sit on the sidelines, as if the market forces can be trusted to operate for the public good."
google
Robert
Darnton
googlebooks
sociology-of-knowledge
copyright
publicdomain
july 2011 by jschneider
ANDREW NORMAN WILSON
may 2011 by jschneider
"I then found out the chubby white man knew what I was doing because the first girl I had spoken to had followed the instructions on the back of her yellow badge – which is to call a certain manager if anyone asks about the work of the yellow badge class. "
class
google
labor
politics
googlebooks
ScanOps
may 2011 by jschneider
Go To Hellman: Judge Denny Chin Says He's Working On It
february 2011 by jschneider
"The judge presiding over the trial, Denny Chin, was elevated to the Circuit Court of Appeals, but because of the shortage of judges on the district court (Judge Chin's District is down 9 judges) caused by the political stalemate in the US Senate, Judge Chin has been forced to keep most of his cases, including the Google case.""The case in which he ruled that Listerine was not as effective against gingivitis as flossing made him a hero with his dentist, and earned him the tabloid nickname of the "Listerine Judge". ""He did his Senior Thesis at Princeton on the "Old Ones" of Chinatown, the elderly Chinese. A photo of his grandfather was at the front of his thesis. His grandfather lived in a building of "railroad" apartments, each of which was occupied by an old man who had been separated from his family by the exclusion laws that had severely curtailed immigration from China to the United States. His grandfather had been able to go back to China only twice, once in the '20's when he got married, and then in the '30's, when Chin's father was born. Chin's Grandfather worked as a waiter in a Chinatown restaurant for many years, and like all the other men who lived in the railroad apartments, he would go to the post office every month and buy a money order to send home to his family in China. Chin's grandfather took the oath of citizenship in 1947, in the same court where his grandson would preside as a judge. Because his grandfather had become a citizen, and because immigration laws had been relaxed, Chin and his parents were allowed to come to the US in 1956 (Chin was only 2 years old)."
googlebooks
China
immigration
february 2011 by jschneider
Free Range Librarian › Random Acts of Trendness
february 2011 by jschneider
"Listening to her, I had that same “ah hah” moment I’ve had a few other times in my library career, like the first time I brought up Mosaic on my home computer and got the TCP-IP stack to work, and when a huge NASA image of Jupiter appeared on my screen I was so excited I had to leave the house and drive for an hour just to calm down. I haven’t shouted “squee” over repositories (not the same sex appeal), but as noted above, I’m getting ready for them.""Elsewhere people have commented on the surge in tablet use; Jason Griffey noted in a post-midwinter webinar that the Consumer Electronics Show featured over 40 tablets. The New York Times recently reported on the surge of e-readers by young adults. In addition to using actual data (“At HarperCollins, for example, e-books made up 25 percent of all young-adult sales in January, up from about 6 percent a year before”), this article was notable for not even attempting to debunk the trend in ebook readers, which assuredly would have happened ten or even three years ago.
My hunch, down the line (and I doubt this is a unique observation), is that we will all end up being our own personal networks; cellular speeds or other technologies will allow us self-contained connectivity. The person with an iPad with 3G is essentially a network unto herself."
streaming
googlebooks
kindle
repositories
reformatting
weeding
wifi
tablets
My hunch, down the line (and I doubt this is a unique observation), is that we will all end up being our own personal networks; cellular speeds or other technologies will allow us self-contained connectivity. The person with an iPad with 3G is essentially a network unto herself."
february 2011 by jschneider
Counting on Google Books - The Chronicle Review - The Chronicle of Higher Education
january 2011 by jschneider
via http://householdopera.typepad.com/household_opera/2011/01/fun-with-google-books-ngram-viewer.html "The English corpus alone contains some 360 billion words, a size that permits analyses on a scale that aren't possible with collections like the Corpus of Historical American English, at Brigham Young University, which tops out at a mere 410 million words.""That leaves out a lot, compared with what you can do with other corpora. As of now, for example, you can't ask for a list of the words that follow the adjective "traditional" for each decade from 1900 to 2000 in order of descending frequency, or restrict a search for "bronzino" to paragraphs that contain "fish" and don't contain "painting." Some of those capabilities will probably be available soon, though users won't be able to replicate many of the computationally heavy-duty exercises that the researchers report in the paper, and linguists won't really be happy until they can download the whole corpus and have their way with it.""And while the Harvard researchers have purged the research corpus of a large proportion of the metadata errors that have plagued Google Books, there are still a fair number of misdated works, and there's no way to restrict a query by genre or topic. You can ask the system to plot the trajectory "dear reader" in books published in Britain during the 19th century, but you can't limit the search to novels.""The more interesting exercises are also, in a way, the most problematic. In one exercise, the authors investigate the evolution of fame, as measured by the relative frequency of mentions of people's names. They began with the 740,000 people with entries in Wikipedia and sorted them by birth date, picking the 50 most frequently mentioned names from each birth year (so that the 1882 cohort contained Felix Frankfurter and Virginia Woolf, and so on). Next they plotted the median frequency of mention for each cohort over time and looked for historical tendencies. It turns out that people become famous more quickly and reach a greater maximum fame now than they did 100 years ago, but that their fame dies out more rapidly. You can take that result as a quantitative demonstration of the rise of what Leo Braudy called "disposable fame" in his book The Frenzy of Renown, which the authors cite. And the technique could be a powerful source of data for the burgeoning field of celebrity studies, as it's designated in the title of a new journal from Routledge.
But the method isn't up to distinguishing among the varieties of fame and eminence that Braudy and others have carved out. And there are obvious limits to equating fame with mere frequency of mention. At one point, for example, the authors observe that "'Galileo', 'Darwin', and 'Einstein' may be well-known scientists, but 'Freud' is more deeply ingrained in our collective subconscious." But it defies belief that Freud is vastly better known than Darwin among the authors of books in a corpus that was drawn from the collections of research libraries. We simply mention Freud more often. Maybe that's because we refer to Darwin only when we're talking about evolution, while we're apt to bring up Freud when we're talking about ourselves. Or maybe there's some other explanation. But the data don't wear their cultural significance on their sleeves; they need cultural historians to speak for them.""I have a friend, a gifted amateur musician and computer scientist, who was involved in electronic music in its early days. Inevitably, within a few years, the field was taken over by composers. That happened partly because new interfaces made the technology more accessible, but also because a command of the subject matter always trumps mere technical expertise. As my friend put it, "It's a lot easier to turn an artist into a geek than to turn a geek into an artist."
In the same way, we'll know that the program of quantitative corpus research is successful when the engineers have stepped back as the techniques are absorbed into the academy, sometimes as a method, sometimes just as a background of operating assumptions. That was the fate of 19th-century philology—the study of "La Vie des Mots" (The Life of Words) in the title of a book of the period by Arsène Darmesteter. Quantitative corpus studies are destined to play the same role, though they imply a different understanding of what the life of words is all about. We really don't even need a name like "culturomics," or any new name at all: this is just e-philology. (Or "the newer philology," since "the new philology" is taken.)""Whatever precedents yesterday's article in Science may establish for the humanities, the 12-author paper won't be one of them."
googlebooks
linguistics
digialhumanities
metadata
distant-reading
But the method isn't up to distinguishing among the varieties of fame and eminence that Braudy and others have carved out. And there are obvious limits to equating fame with mere frequency of mention. At one point, for example, the authors observe that "'Galileo', 'Darwin', and 'Einstein' may be well-known scientists, but 'Freud' is more deeply ingrained in our collective subconscious." But it defies belief that Freud is vastly better known than Darwin among the authors of books in a corpus that was drawn from the collections of research libraries. We simply mention Freud more often. Maybe that's because we refer to Darwin only when we're talking about evolution, while we're apt to bring up Freud when we're talking about ourselves. Or maybe there's some other explanation. But the data don't wear their cultural significance on their sleeves; they need cultural historians to speak for them.""I have a friend, a gifted amateur musician and computer scientist, who was involved in electronic music in its early days. Inevitably, within a few years, the field was taken over by composers. That happened partly because new interfaces made the technology more accessible, but also because a command of the subject matter always trumps mere technical expertise. As my friend put it, "It's a lot easier to turn an artist into a geek than to turn a geek into an artist."
In the same way, we'll know that the program of quantitative corpus research is successful when the engineers have stepped back as the techniques are absorbed into the academy, sometimes as a method, sometimes just as a background of operating assumptions. That was the fate of 19th-century philology—the study of "La Vie des Mots" (The Life of Words) in the title of a book of the period by Arsène Darmesteter. Quantitative corpus studies are destined to play the same role, though they imply a different understanding of what the life of words is all about. We really don't even need a name like "culturomics," or any new name at all: this is just e-philology. (Or "the newer philology," since "the new philology" is taken.)""Whatever precedents yesterday's article in Science may establish for the humanities, the 12-author paper won't be one of them."
january 2011 by jschneider
Household Opera: Fun with Google Books Ngram Viewer
january 2011 by jschneider
"Thanks to my background research on the commonplace book project, I already knew that "scrapbooking" (as we now call it) took off in the mid-19th century and eventually became a more popular pastime than the keeping of commonplace books; the graph suggests something of the rise of one format and the fall of the other. "
ngrams
googlebooks
january 2011 by jschneider
SSRN-Legally Speaking: The Dead Souls of the Google Booksearch Settlement by Pamela Samuelson
january 2011 by jschneider
"Pamela Samuelson in her forthcoming ACM article on the settlement notes that "the settlement would, in effect, give Google the exclusive right to commercially exploit millions of orphan books."" via http://blog.librarylaw.com/librarylaw/2009/04/google-book-settlement-orphan-works-and-foreign-works.html
googlebooks
copyright
orphanworks
january 2011 by jschneider
From Google and Harvard, a New Way to Analyze the Written Word - NYTimes.com
december 2010 by jschneider
"So far, Google has scanned more than 11 percent of the entire corpus of published books, about 2 trillion words. The data analyzed in the Science article contains about 4 percent of the corpus."
nytimes
googlebooks
digitalhumanities
language-evolution
december 2010 by jschneider
The beauty of engineers: Google Books.app | booktwo.org
december 2010 by jschneider
turn off 3D pageturns! :)
ebooks
GoogleBooks
december 2010 by jschneider
Privacy, Google, and the Reading Public « The Scholarly Kitchen
december 2010 by jschneider
"it’s one thing to use Gmail in the full knowledge that Google is going to examine the contents of your email and your calendar. It’s another thing, for example, to access e-books and be unsure whether your online reading behavior is going to be examined in the same way.""Accessing the Internet, for any purpose, almost always requires much more personal disclosure — to your home access provider, your employer, your library, or whatever other entity is granting or selling you access. To argue that the Google settlement should only be approved if it includes provisions that make use of the service as private as the use of printed books is, in practice, to argue that the settlement should not be passed."
advertising
privacy
Google
GoogleBooks
december 2010 by jschneider
Shimenawa » Blog Archive » Eye to eye: The Authors Guild, Random House, and GBS
august 2010 by jschneider
"One of the keystones of the settlement proposal is lodged in Attachment A (Author-Publisher Procedures), which attempts to clarify the digital rights issues that have brought authors and publishers so often to litigation or its brink. The proposal provides for a default bright line assignment of revenue from the exploitation of works included in the terms of the settlement. ""It is not too much to suggest that the conflict over ebook rights and royalties is one of the most outstanding irritants in the transition to digital publishing. It is an irritant that has drawn the Authors Guild and authors, and the AAP and publishers, into conflict time and time again."
publishing
ebooks
rights
backlist
litigation
GoogleBooks
august 2010 by jschneider
How we talk about the president: A quick exploration in Google Books « Everybody's Libraries
june 2010 by jschneider
"I wondered, as I went through the Emory collection, whether the terms we use for the president reflect shifts in the role he has played over American history. Is he called “commander in chief” more in times of war or military buildup, for instance? How often was he instead called “chief magistrate” or “chief executive” over the course of American history? And how long did “chief magistrate” stay in common use, and what replaced it?
Not too long ago, those questions would have simply remained idle curiosity. Perhaps, if I’d had the time and patience, I could have painstakingly compiled a small selection of representative writings from various points in US history, read through them, and tried to draw conclusions from them. But now I– and anyone else on the web– also have a big searchable, dated corpus of text to query: the Google Books collection. Could that give me any insight into my questions?
It looks like it can, and without too much expenditure of time. I’m by no means an expert on corpus analysis, but in a couple of hours of work, I was able to assemble promising-looking data that turned up some unexpected (but plausible) results. Below, I’ll describe what I did and what I found out.""It’s true that the Google corpus is imperfect, as I and others have noted before. The metadata isn’t always accurate; the number of reported hits is approximate when more than 100 or so, and the mix of volumes in Google’s corpus varies in different time periods. (For instance, recent years of the corpus may include more magazine content than earlier years; and reprints can make texts reappear decades after they were actually written. The rise of print-on-demand scans of old public-domain books in the 2000s may be partly responsible for the uptick in “chief magistrate” that decade, for instance.)"
googlebooks
Lincoln
corpus-analysis
textual-analysis
google-books
Not too long ago, those questions would have simply remained idle curiosity. Perhaps, if I’d had the time and patience, I could have painstakingly compiled a small selection of representative writings from various points in US history, read through them, and tried to draw conclusions from them. But now I– and anyone else on the web– also have a big searchable, dated corpus of text to query: the Google Books collection. Could that give me any insight into my questions?
It looks like it can, and without too much expenditure of time. I’m by no means an expert on corpus analysis, but in a couple of hours of work, I was able to assemble promising-looking data that turned up some unexpected (but plausible) results. Below, I’ll describe what I did and what I found out.""It’s true that the Google corpus is imperfect, as I and others have noted before. The metadata isn’t always accurate; the number of reported hits is approximate when more than 100 or so, and the mix of volumes in Google’s corpus varies in different time periods. (For instance, recent years of the corpus may include more magazine content than earlier years; and reprints can make texts reappear decades after they were actually written. The rise of print-on-demand scans of old public-domain books in the 2000s may be partly responsible for the uptick in “chief magistrate” that decade, for instance.)"
june 2010 by jschneider
Crunching Words in Great Number - Technology - The Chronicle of Higher Education
june 2010 by jschneider
"if Google Books has "changed the landscape" of our scholarly perception, and it has, perhaps its greatest legacy will be the spur it gave to the educational community to "do it right"—to create a virtual depository of the kind Robert Darnton and others like him have been pleading for: a virtual collection of our cultural heritage that actually meets the needs of scholarship and public education. At least we can hope that will be its great legacy.""The history of science is in no small part the history of instruments—better and better (and, usually more and more expensive) gadgets and techniques employed in the service of increasingly precise measurement. Telescopes and particle accelerators allow us to see almost to the beginning of the universe, microscopes resolve the unimaginably small, and supercomputers find order in vast quantities of data. The humanities also use technology—classicists were early adopters of photography, and every new technology of imaging has opened up texts that were theretofore invisible. Indeed, literary theory itself can be thought of as a technology, similar to mathematical technique, in that both provide powerful ways of analyzing and thinking about their respective domains.""By mining the Google database, it should be possible to trace literary relations in a whole new way: to show who was the first person to use an influential term or to highlight a theme, and to find verbal patterns that will help reveal the real literary relations whereby the few novelists we still read emerged from the background noise of the genre fiction of their day. "
googlebooks
digitalhumanities
Chronicle
distant-reading
close-reading
june 2010 by jschneider
Coyle's InFormation: Trust and the Settlement
march 2010 by jschneider
"
The interesting upshot of this entire settlement process is that by digitizing the contents of libraries and managing those digital copies through contracts, the publishers could finally get the kind of control over library uses that they would have liked to have over the paper books held in libraries. They would like to have controls over inter-library loan, classroom use, and reserves, but they cannot exercise such controls in the analog world. Publishers have argued since the very early days of digital documents that all lending of digital documents is the making of a copy, and therefore is not allowed by copyright law.
As a matter of fact, right on page one of the Plaintiff's statement for the judge, among the bullet points describing the main achievements of the settlement, is this one:
Limits library uses of digital copies of Rightsholders’ works.
Perhaps it has been naive of me to see this settlement as being about Google's commercialization of the world of books. It is possible that the more pertinent end result could be a renewed control of books and their uses by the publisher community. Attempts to modify copyright law to cover digital resources have failed, and the rights of the public in relation to those resources are as yet unclear. This has left a gap that the AAP/AG settlement exploits fully."
Karen
Coyle
GoogleBooks
copyright
The interesting upshot of this entire settlement process is that by digitizing the contents of libraries and managing those digital copies through contracts, the publishers could finally get the kind of control over library uses that they would have liked to have over the paper books held in libraries. They would like to have controls over inter-library loan, classroom use, and reserves, but they cannot exercise such controls in the analog world. Publishers have argued since the very early days of digital documents that all lending of digital documents is the making of a copy, and therefore is not allowed by copyright law.
As a matter of fact, right on page one of the Plaintiff's statement for the judge, among the bullet points describing the main achievements of the settlement, is this one:
Limits library uses of digital copies of Rightsholders’ works.
Perhaps it has been naive of me to see this settlement as being about Google's commercialization of the world of books. It is possible that the more pertinent end result could be a renewed control of books and their uses by the publisher community. Attempts to modify copyright law to cover digital resources have failed, and the rights of the public in relation to those resources are as yet unclear. This has left a gap that the AAP/AG settlement exploits fully."
march 2010 by jschneider
Go To Hellman: Business Idea Number 3: Gluejar Book Search
march 2010 by jschneider
"There were 3 different presentations, from Stanford, NC State (3.65 MB ppt), and University of Wisconsin, Oshkosh, on "virtual bookshelves".""While the virtual bookshelf is a sensible and practical incremental improvement on the library catalog interface, it's also backward looking. People looking for information today want to search inside the books, not just "browse the stacks". But libraries don't have the ability (today) to search inside the books that they think they own.""as the Open Book Alliance's Peter Brantley has argued, it's very hard to tell what sort of innovations might arise from the availability of large numbers of digitized texts as data; the same goes for indices of these works.""Gluejar Book Search would be a business focused on collecting, aggregating and redistributing full-text indices of copyrighted material."
Eric
Hellman
googlebooks
digiization
search-inside
virtual-browse
full-text-indexing
march 2010 by jschneider
Go To Hellman: Copyright-Safe Full-Text Indexing of Books
february 2010 by jschneider
"ull-text indexing is allowed as fair use under US copyright law. Indices are allowed as "transformative uses". Judge Robert Patterson's decision (pdf, 195K) in the "Harry Potter Lexicon" case gives an excellent background of this jurisprudence and concludes:
The purpose of the Lexicon’s use of the Harry Potter series is transformative.""It doesn't take a computer science degree to see that it's easy to reconstruct the sentence from this index. For that reason this form of index is equivalent to a copy. If you remove the position pointers, however, the index loses enough information that the sentence cannot be reconstructed. So if we take the words on a page of text and sort the words in each sentence, then sort the word-sorted sentences, we get an index of a page that can't be used to reconstruct text, but can be used to build a useful full-text index of a book."
indexing
copyright
googlebooks
Eric
Hellman
The purpose of the Lexicon’s use of the Harry Potter series is transformative.""It doesn't take a computer science degree to see that it's easy to reconstruct the sentence from this index. For that reason this form of index is equivalent to a copy. If you remove the position pointers, however, the index loses enough information that the sentence cannot be reconstructed. So if we take the words on a page of text and sort the words in each sentence, then sort the word-sorted sentences, we get an index of a page that can't be used to reconstruct text, but can be used to build a useful full-text index of a book."
february 2010 by jschneider
Coyle's InFormation: Academic publishing as a percentage of Google Books
january 2010 by jschneider
"it was interesting to me that a university press (Oxford UP) turned up in the #1 spot as the publisher with the greatest number of books in the OL. As a matter of fact, out of the top 20 publishers, five are university presses (UPs), and they make up over 1/4 of the books in that group.""This study of OL publisher data was just experimental, so these figures should be taken with a grain of salt. However, this shows that there is an interesting study to be done, if it can be done, quantifying the relative roles of academic and commercial publishing."
publishing
googlebooks
Karen
Coyle
january 2010 by jschneider
Google & the Future of Books: An Exchange - The New York Review of Books
december 2009 by jschneider
It's a bit jarring to see "January 14, 2010" 2+ weeks ahead! "I like Theodore Koditschek's suggestion that Google be treated as a public utility subject to regulation in the public interest. If that seems unrealistic, one should consider a compromise solution, which would draw a line between the books digitized by Google that are strictly commercial and the books that are no longer in print, although some of them are still covered by copyright. Google would continue with its project to commercialize digital copies of books currently in print, sharing the proceeds with the rights holders. At the same time, it would continue to scan out-of-print books and to include them in a database that would constitute a separate, open-access repository. The rights holders of the in-copyright but out-of-print books in that database would be given the opportunity to choose to keep their books out of the open-access plan and, if they preferred, to include their books in Google's commercial operation."
nyreviewofbooks
googlebooks
hathitrust
Robert
Darnton
copyright
orphan-works
december 2009 by jschneider
improved google books search-within-book interface? « Bibliographic Wilderness
december 2009 by jschneider
"What I now see, at least in one example, is actual scanned pages, with band markers on the vertical scrollbar indicating at what points the matches were found; clicking on them shows highlighted matches on actual scanned images. Very nice! Although I wonder if it will confuse some users."
googlebooks
december 2009 by jschneider
Coyle's InFormation: 1923
november 2009 by jschneider
"how many public domain works have been republished after the 1923 cut-off date?
Google appears to currently lack the ability to make the proper connection between the original text that is in the public domain and the many “manifestations” (as they are called in library-speak) that were published later — and are also in the public domain, at least as far as the primary text is concerned. This is a non-trivial exercise when one is working only with the metadata that describes the work, but may become more feasible with the ability to do a full text analysis of the contents of the various packages in which publishers have placed the original work of Melville. I assume that Google is working on this, although I cannot predict how it will affect their assessment of the PD/(c) split.
"
publicdomain
manifestations
Karen
Coyle
googlebooks
1923
Google appears to currently lack the ability to make the proper connection between the original text that is in the public domain and the many “manifestations” (as they are called in library-speak) that were published later — and are also in the public domain, at least as far as the primary text is concerned. This is a non-trivial exercise when one is working only with the metadata that describes the work, but may become more feasible with the ability to do a full text analysis of the contents of the various packages in which publishers have placed the original work of Melville. I assume that Google is working on this, although I cannot predict how it will affect their assessment of the PD/(c) split.
"
november 2009 by jschneider
Exact Editions: Google is Going for It
november 2009 by jschneider
"Google has this week announced its Google Editions project, which is intended to make its books database resource, title by title, available to all readers everywhere in every format of ebook reader. So Google at one step is embracing and enlisting all the ebook platforms which might otherwise be a form of competitive counterbalance to the Google Books Library."
googlebooks
november 2009 by jschneider
Coyle's InFormation: Googled
november 2009 by jschneider
" 1. Engineering can fix anything
2. Information is neutral and measurable
3. Advertising is information"
googlebooks
Karen
Coyle
google
advertising
2. Information is neutral and measurable
3. Advertising is information"
november 2009 by jschneider
Go To Hellman: Copyless Crowdscanning: How to Legally Index the World's Books
october 2009 by jschneider
"It made me realize that the idea of having hundreds of thousands of people scanning their books with cheap scanners was not out of the realm of possibility. The main barrier to assembling a database of all the world's books will no longer be the scanning, but rather the laws governing copyright. So my focus is on how to do crowdscanning so that copyrights are not infringed; the easiest way to do that is to not make any copies.""neither the index aggregator nor the sentence server would be able to reconstitute a book or even the pages from a book"
engineering
googlebooks
crowdscanning
october 2009 by jschneider
The Secret Of Google's Book Scanning Machine Revealed - As A Matter Of Fact Blog : NPR
october 2009 by jschneider
"Google created some seriously nifty infrared camera technology that detects the three-dimensional shape and angle of book pages when the book is placed in the scanner. This information is transmitted to the OCR software, which adjusts for the distortions and allows the OCR software to read text more accurately."
NPR
googlebooks
digitization
october 2009 by jschneider
Blown to Bits » Blog Archive » Do It Yourself Book Scanning
october 2009 by jschneider
"Or make them publicly available. I said “can,” not “may” or “should.” But the existence of the device has the potential to raise lots of the same kinds of questions those other duplicating technologies raised. It empowers individuals, and enough empowered individuals could produce a Wikipedian digital library, collectively assembled, imperfect and incomplete, but growing and expanding."
digitization
DIY-scanner
googlebooks
october 2009 by jschneider
The Millions: Bringing Book Scanning Home
october 2009 by jschneider
"dan reetz & his diy book scanner
It’s pure 21st-century ingenuity. Reetz designed his first book scanner because, as a grad student at North Dakota State, he was appalled by textbook prices. Then he built it, in two days, from old digital cameras, cardboard, and scrap parts; a friend wrote the page-processing software.""I think the Reetz-Clancy continuum augurs good things for the future of books. On one end, the recognition that books have to live online now, and that publishing has to operate at internet scale. On the other, the passion for (obsession with?) independence and the cottage-industry craftiness that’s been the best part of book publishing for so long already."
digitization
DIY-scanner
googlebooks
It’s pure 21st-century ingenuity. Reetz designed his first book scanner because, as a grad student at North Dakota State, he was appalled by textbook prices. Then he built it, in two days, from old digital cameras, cardboard, and scrap parts; a friend wrote the page-processing software.""I think the Reetz-Clancy continuum augurs good things for the future of books. On one end, the recognition that books have to live online now, and that publishing has to operate at internet scale. On the other, the passion for (obsession with?) independence and the cottage-industry craftiness that’s been the best part of book publishing for so long already."
october 2009 by jschneider
Go To Hellman: The Revolution Will Be Digitized (By Cheap Book Scanners)
october 2009 by jschneider
"Reetz is a tinkerer and a liberator of information. He spent some time in Russia and became accustomed to the conveniences of digital books in a society that doesn't pay much attention to copyright laws...went dumpster diving for materials, then posted instructions for how to make the scanner online.""let's assume that an effective book digitizer can be built and deployed for $500...Then the cost of putting a book scanner in 20,000 libraries would be $10,000,000. If these libraries digitized an average of even one book per day, they could digitize 10,000,000 books in two years. Since 10 books per day should be well within the capabilities of an inexpensive digitizer, the libraries should have no technical difficulties with digitizing 4 million books per month.
If libraries acquired the capability of digitizing millions of books per month, then Google's erstwhile monopoly on digitized out-of-print books could evaporate quickly in an appropriate legal environment."
Eric
Hellman
digitization
googlebooks
opensource
accessibility
recommended
If libraries acquired the capability of digitizing millions of books per month, then Google's erstwhile monopoly on digitized out-of-print books could evaporate quickly in an appropriate legal environment."
october 2009 by jschneider
Digital Search II: A User Perspective on Database Design « Easily Distracted
october 2009 by jschneider
"If I’m anxious about Google becoming a database vendor, it’s partly because the user experience with existing databases has been so dismal to date. On the other hand, Google’s understanding of and commitment to usability is head and shoulders above any of the other vendors in that world. Maybe Google’s completed version of Book Search will have an interface that invites rather than repels use, and has a stable long-term vision driving its design. If so, it might almost be worth it to just let them go ahead and fence off the commons, for the same reason that the consolidation of monopoly capitalism in the late 19th Century at least paid off in terms of standardization across a broad range of products and technologies." <followed by comments about other vendors>
googlebooks
usability
databases
code4lib
october 2009 by jschneider
Open Content Alliance (OCA) » Blog Archive » Google Claims to be the Lone Defender of Orphans: Not lone, not defender
october 2009 by jschneider
"There is an alternative, and they know it — orphan works legislation — that up until the last session of Congress had been working its way through the House and Senate. It was not perfect, but was getting close to what we need. Best yet, it passed one house — at least until Google effectively sideswiped the process with their settlement proposal."
Googlebooks
orphan-works
copyright
october 2009 by jschneider
Google's Book Search: A Disaster for Scholars - The Chronicle Review - The Chronicle of Higher Education
october 2009 by jschneider
"Google's book search is clearly on track to becoming the world's largest digital library. No less important, it is also almost certain to be the last one. Google's five-year head start and its relationships with libraries and publishers give it an effective monopoly: No competitor will be able to come after it on the same scale. Nor is technology going to lower the cost of entry. Scanning will always be an expensive, labor-intensive project."
GoogleBooks
Metadata
BISAC
october 2009 by jschneider
IPI: Google Books Settlement: Taking the Long View
september 2009 by jschneider
"
My advice to Judge Chin is to declare a “cooling off” period (initially a year) to allow Congress, the Library of Congress, and the Executive Branch to undertake systematic study and review of how copyright law should be adapted to address such a fundamental advance in the technology for providing access to knowledge. The government should immediately assemble a study panel that is broadly representative of the interested parties and the public at large."
googleBooks
My advice to Judge Chin is to declare a “cooling off” period (initially a year) to allow Congress, the Library of Congress, and the Executive Branch to undertake systematic study and review of how copyright law should be adapted to address such a fundamental advance in the technology for providing access to knowledge. The government should immediately assemble a study panel that is broadly representative of the interested parties and the public at large."
september 2009 by jschneider
Coyle's InFormation: DOJ drops bomb in Google/AAP settlement
september 2009 by jschneider
"Department is uneasy that known rights holders will be the ones negotiating with the rights registry, and that they will also benefit from any money made on orphan works. In other words, it will be to the advantage of rights holders that the parents of those orphans NOT be found. DOJ suggests, among other things, that the money made on orphan works not be paid out to others, but be used to try to find rights holders."
GoogleBooks
doj
Karen
Coyle
september 2009 by jschneider
Go To Hellman: In Which Judge Denny Chin Becomes an Orphan Works Hero
september 2009 by jschneider
"This sort of ruling would only lead to a solution to the orphan works problem if the states more eager to address the problem than Congress has been. Realistically, in 2009, Congress is fully occupied with 2 wars, a global financial crisis and figuring out how to solve health care. The states, on the other hand, are mostly trying to figure out how to close huge budget gaps; I imagine that most State Attorneys General (and state legislatures, where applicable) would love to be able to deliver both the money and the benefits that would accrue from a non-exclusive deal with Google. Another advantage of a state-run rights registry is that it might avoid some of the liability for errors that a privately run registry would have. Or this might be just an ill-informed fantasy of mine."
GoogleBooks
Copyright
Judges
orphan-works
september 2009 by jschneider
Google's plan for world's biggest online library: philanthropy or act of piracy? | Books | The Observer
september 2009 by jschneider
"First, they have questioned whether the primary responsibility for digitally archiving the world's books should be allowed to fall to a commercial company. In a recent essay in the New York Review of Books, Robert Darnton, the head of Harvard University's library, argued that because such books are a common resource – the possession of us all – only public, not-for-profit bodies should be given the power to control them.
The second, related criticism is that Google's scanning of books is actually illegal. This allegation has led to Google becoming mired in a legal battle whose scope and complexity makes the Jarndyce and Jarndyce case in Bleak House look straightforward.
At its centre, however, is one simple issue: that of copyright. The inconvenient fact about most books, to which Google has arguably paid insufficient attention, is that they are protected by copyright."
GoogleBooks
The second, related criticism is that Google's scanning of books is actually illegal. This allegation has led to Google becoming mired in a legal battle whose scope and complexity makes the Jarndyce and Jarndyce case in Bleak House look straightforward.
At its centre, however, is one simple issue: that of copyright. The inconvenient fact about most books, to which Google has arguably paid insufficient attention, is that they are protected by copyright."
september 2009 by jschneider
Reading with Machines « Early Modern Online Bibliography
september 2009 by jschneider
"There are two main classes of literary problems that might immediately benefit from computational help. In the first, you’re looking for fresh insights into texts you already know (presumably because you’ve read them closely). In the second, you’d like to be able to say something about a large collection of texts you haven’t read (and probably can’t read, even in principle, because there are too many of them; think of the set of all novels written in English). In both cases, it would almost certainly be useful to classify or group the texts together according to various criteria, a process that is in fact at the heart of much computationally assisted literary work.""It might allow you to test, support, or refine your large-scale claims about developments in literary and social history.""The same process might also draw your attention to a particular work or set of works that you’d otherwise not have known about or thought to study."may challenge "your interpretation"
digital-humanities
scholarship
literary-studies
text-analysis
corpus
bibliographies
machine-learning
googlebooks
september 2009 by jschneider
Coyle's InFormation: Google Books Metadata and Library Functions
september 2009 by jschneider
"When you ask people what metadata is needed for a service, they will often reply something like "everything" or "more is better." I'm going to take a different approach here because I think it is a good idea to connect metadata needs with actual functionality. This not only justifies the metadata, but the functionality helps explain the nature of the metadata that is required. For example, if we say that we want "date of publication" in our metadata, it may seem that we could use the date from the publication statement, which can have dates like "c1956" or "[1924]." If, instead, we indicate that we want to use dates in computational research, then it is clear (hopefully) that we need the fixed field date (from the 008 field in the MARC record)." Functions to support: scholarship, metasearch, collection development, linking, computation
Karen
Coyle
googlebooks
metadata
september 2009 by jschneider
Third Wheels on Class Action Coffee Search
september 2009 by jschneider
"He said it was important to get more assurances on privacy into the settlement agreement because "at least then we'll have something on paper that we can enforce". ...
1. Can third parties enforce "pieces of paper"? Suppose Joe and Mary, instead of hiring a divorce lawyer, sign an agreement to have coffee together once a week at Starbucks, can Starbucks sue them if one or both of them breaches the agreement?
2. Can third parties sue to enforce a class action "piece of paper" that has been approved by a court? ...
3. Do opt outs affect the ability of class members to enforce the class action agreements? If the Yankee widows can opt out of the settlement, can they sue to get better coffee agreement compliance even if they opt out?
4. If you are an author concerned about privacy, do you have more or less leverage on the devilish details if you opt in or opt out of the settlement agreement?
5. Why am I so focused on coffee?"
law
humor
coffee
googlebooks
1. Can third parties enforce "pieces of paper"? Suppose Joe and Mary, instead of hiring a divorce lawyer, sign an agreement to have coffee together once a week at Starbucks, can Starbucks sue them if one or both of them breaches the agreement?
2. Can third parties sue to enforce a class action "piece of paper" that has been approved by a court? ...
3. Do opt outs affect the ability of class members to enforce the class action agreements? If the Yankee widows can opt out of the settlement, can they sue to get better coffee agreement compliance even if they opt out?
4. If you are an author concerned about privacy, do you have more or less leverage on the devilish details if you opt in or opt out of the settlement agreement?
5. Why am I so focused on coffee?"
september 2009 by jschneider
related tags
"1984" ⊕ "bibliographic-fastidiousness" ⊕ "intentionalal-=metadata" ⊕ "social-life-of-information" ⊕ #gbslaw ⊕ *** ⊕ **** ⊕ ***** ⊕ ****** ⊕ :( ⊕ AADL ⊕ AAP ⊕ aboutness ⊕ access ⊕ access-vs.-preservation ⊕ access2008 ⊕ accessibility ⊕ accuracy ⊕ Acharya ⊕ ACM ⊕ Adobe ⊕ advertising ⊕ advetising ⊕ aggreements ⊕ ALA ⊕ ALAWashOff ⊕ ALEPH ⊕ algorithms-vs.-human-collation ⊕ alphabetization ⊕ Amazon ⊕ amicus-briefs ⊕ analysis ⊕ analytics ⊕ Andrew ⊕ android ⊕ annotation ⊕ anonymity ⊕ antitrust ⊕ Anurag ⊕ api ⊕ apis ⊕ archive.org ⊕ archiving ⊕ ARL ⊕ arstechnica ⊕ articles ⊕ artifacts ⊕ audio ⊕ authentication ⊕ authorities ⊕ authority ⊕ authors ⊕ authorship ⊕ automation ⊕ avaialbility ⊕ backlist ⊕ balance ⊕ Bess ⊕ Bezos ⊕ bibliographers ⊕ bibliographic-control ⊕ bibliographies ⊕ binding ⊕ BISAC ⊕ blog ⊕ blogging ⊕ Blyberg ⊕ BNF ⊕ book-digitization-projects ⊕ books ⊕ books-as-artifacts ⊕ books-toread ⊕ BookSurge ⊕ Borges ⊕ Bowker ⊕ Brankley ⊕ brantley ⊕ Brewster ⊕ brian ⊕ browsing ⊕ BRR ⊕ business ⊕ California ⊕ CampusTechnology ⊕ cataloging ⊕ catalogs ⊕ CDL ⊕ cdt ⊕ cell ⊕ censorship ⊕ change ⊕ China ⊕ Chronicle ⊕ CIC ⊕ citation ⊕ citations ⊕ civic-responsibility ⊕ class ⊕ class-action ⊕ classics ⊕ classification ⊕ CLIR ⊕ close-reading ⊕ closed-universe ⊕ cloudcomputing ⊕ code4lib ⊕ coffee ⊕ Cohen ⊕ collaboration" ⊕ collation ⊕ collection ⊕ collection-analysis ⊕ collection-development ⊕ commentary ⊕ commercialization ⊕ compilations ⊕ compression ⊕ concordances ⊕ conference ⊕ Conferences ⊕ confilct-of-interest ⊕ Congress ⊕ connectivity ⊕ consortia ⊕ content ⊕ content-is-an-endgame ⊕ context ⊕ contracts ⊕ copy&paste ⊕ copyright ⊕ corpus ⊕ corpus-analysis ⊕ cory ⊕ Courant ⊕ cover-images ⊕ covers ⊕ coyle ⊕ criticism ⊕ critique ⊕ crosswalking ⊕ crowdscanning ⊕ culture ⊕ curation ⊕ curl-up-factor ⊕ Dan ⊕ Darnton ⊕ data-mining ⊕ databases ⊕ datamining ⊕ democracy ⊕ dempsey ⊕ denial-of-service ⊕ development ⊕ did-you-mean ⊕ Diderot ⊕ digialhumanities ⊕ digiization ⊕ digital ⊕ digital-divide ⊕ digital-history ⊕ digital-humanities ⊕ digital-libraries ⊕ Digital-preservation ⊕ digitalhumanities ⊕ digitalprovenance ⊕ digitization ⊕ discovery ⊕ dissertation ⊕ dissertations ⊕ distant-reading ⊕ DIY ⊕ DIY-scanner ⊕ DLF ⊕ DLIB ⊕ doctorow ⊕ doj ⊕ domain ⊕ downloading ⊕ drm ⊕ e-books ⊕ ebooks ⊕ economics ⊕ Economist ⊕ EDUCAUSE ⊕ EFF ⊕ ejournals ⊕ electronic-publishing ⊕ embedding ⊕ Emory ⊕ engineering ⊕ England ⊕ English ⊕ Eric ⊕ errata ⊕ errors ⊕ etymology ⊕ EU ⊕ evaluation ⊕ Evergreen ⊕ examples ⊕ exlibris ⊕ EZProxy ⊕ failwhale ⊕ fair-use ⊕ fairuse ⊕ film ⊕ firstmonday ⊕ Flash ⊕ foreign ⊕ foresight ⊕ forgetting ⊕ Foucault ⊕ foundpoetry ⊕ France ⊕ FRBR ⊕ freshmen ⊕ fsf ⊕ full-text ⊕ full-text-indexing ⊕ funding-models ⊕ future ⊕ future-and-past-of-media ⊕ future-of-the-book ⊕ future-of-the-library ⊕ genre ⊕ Germany ⊕ global ⊕ google ⊕ google-and-libraries ⊕ google-books ⊕ google-vs-libraries ⊕ google-vs.-libraries ⊕ googlebooks ⊖ govdocs ⊕ GPO ⊕ greasemonkey ⊕ Gutenberg ⊕ Harvard ⊕ hathitrust ⊕ Haverford ⊕ hegemony ⊕ Hellman ⊕ highwire ⊕ history ⊕ Hollywood ⊕ html ⊕ humanities ⊕ humor ⊕ identities ⊕ Ideos ⊕ idpf ⊕ III ⊕ ILL ⊕ immigration ⊕ indexing ⊕ India ⊕ information-as-public-good ⊕ infrastructure ⊕ ink ⊕ innovation ⊕ intellectual-freedom ⊕ intellectual-property ⊕ intellectual-publishing ⊕ interdisciplinary ⊕ interesting ⊕ internet ⊕ IP ⊕ IR ⊕ ISBN ⊕ isbns ⊕ javascript ⊕ JBIG2 ⊕ Jeff ⊕ JeP ⊕ Jessamyn ⊕ JISC ⊕ Joe ⊕ John ⊕ Jonathan ⊕ Jonathan-Rochkind ⊕ JonathanRochkind ⊕ jQuery ⊕ JSON ⊕ Judges ⊕ just-in-time ⊕ Kahle ⊕ karen ⊕ Karen-Coyle ⊕ keyword ⊕ kindle ⊕ Kirtas ⊕ knol ⊕ Koha ⊕ KWIC ⊕ labor ⊕ Lakoff ⊕ language ⊕ language-evolution ⊕ Lavoie ⊕ law ⊕ Lawrence ⊕ LCSH ⊕ legal ⊕ Lessig ⊕ libraires ⊕ librarians ⊕ librarianship ⊕ libraries ⊕ library ⊕ library-vs-google ⊕ library-vs-internet ⊕ Library2.0 ⊕ librarything ⊕ licensing ⊕ Lincoln ⊕ linguistics ⊕ linkeddata ⊕ linking ⊕ literary-studies ⊕ litigation ⊕ longtail ⊕ lorcan ⊕ LOTF09 ⊕ lulu ⊕ lyrics ⊕ machine-learning ⊕ machine-translation ⊕ machinelearning ⊕ MacroExpress ⊕ manifestations ⊕ maps ⊕ Marc ⊕ marketing ⊕ mass ⊕ mass-communication ⊕ mbooks ⊕ meaning ⊕ media ⊕ meetings ⊕ memes ⊕ memory ⊕ Meno ⊕ metadata ⊕ metaphor ⊕ metaphors ⊕ metasearch ⊕ Michael ⊕ michigan ⊕ microfilm ⊕ middleman ⊕ mobile ⊕ money ⊕ monopolies ⊕ monopoly ⊕ Murray ⊕ music ⊕ names ⊕ navigation ⊕ negotiation ⊕ network ⊕ newmedia ⊕ Newsweek ⊕ NewYorker ⊕ ngrams ⊕ NISO ⊕ NPR ⊕ ny-review-of-books ⊕ NYPL ⊕ nyreviewofbooks ⊕ nytimes ⊕ OAI-PMH ⊕ oca ⊕ OCLC ⊕ ocr ⊕ offline ⊕ olpc ⊕ olpc-as-ebook-reader ⊕ online ⊕ opac ⊕ openaccess ⊕ openness ⊕ openness-not-opensource ⊕ opensource ⊕ OpenURL ⊕ opt-out ⊕ orphan-works ⊕ orphaned-works ⊕ orphanworks ⊕ orwant ⊕ Oxford ⊕ Pace ⊕ page-image ⊕ PageRank ⊕ patents ⊕ Paul ⊕ periodicals ⊕ peter ⊕ Peter-Brantley ⊕ Peter-Murray ⊕ phrase-search ⊕ phrases ⊕ piracy ⊕ Plato ⊕ pod ⊕ podcast ⊕ poe ⊕ poetry ⊕ politics ⊕ Portico ⊕ power ⊕ PR ⊕ precision ⊕ presentations ⊕ preservation ⊕ preservation-vs-access ⊕ Princeton ⊕ print-formats ⊕ printing ⊕ privacy ⊕ privatization ⊕ problems ⊕ proquest ⊕ provenance ⊕ pubilcdomain ⊕ public ⊕ public-access ⊕ public-domain ⊕ public-good ⊕ public-libraries ⊕ publicaccess ⊕ publicdomain ⊕ publishing ⊕ quality ⊕ quality-control ⊕ qualitycontrol ⊕ quantity-not-quality ⊕ quotation ⊕ quotations ⊕ quotes ⊕ RDF ⊕ reader-privacy ⊕ reading ⊕ recall ⊕ recommended ⊕ reference ⊕ references ⊕ reflection ⊕ reformatting ⊕ reinforcing-ideas ⊕ relevance ⊕ relevancy-ranking ⊕ repositories ⊕ research ⊕ responsibility ⊕ Richard ⊕ rights ⊕ rights-metadata ⊕ rightsmanagement ⊕ Rob ⊕ Robert ⊕ Rochkind ⊕ rot ⊕ roy ⊕ Roy-Tennant ⊕ royalties ⊕ sad ⊕ Sadler ⊕ safeharbor ⊕ Salder ⊕ scanning ⊕ ScanOps ⊕ Schneider ⊕ scholarly-communication ⊕ scholarship ⊕ schoogle ⊕ screenplays ⊕ screenreaders ⊕ scribd ⊕ scribe ⊕ scribes ⊕ Scriblio ⊕ search ⊕ search-inside ⊕ Searcher ⊕ searching ⊕ semanticlibraries ⊕ serendipity ⊕ services ⊕ settlement ⊕ sfx ⊕ sharing ⊕ sheetmusic ⊕ shopping ⊕ skimming ⊕ slides ⊕ smell ⊕ sociology-of-knowledge ⊕ songwriters ⊕ Spaulding ⊕ spelling ⊕ spelling-correction ⊕ standards ⊕ stanford ⊕ streaming ⊕ Styles ⊕ subject-classification ⊕ subject-headings ⊕ subscriptions ⊕ summarization ⊕ summon ⊕ Switzerland ⊕ table-of-contents ⊕ tablets ⊕ Talis ⊕ Tech ⊕ technology ⊕ TEI ⊕ temporal-IR ⊕ tennant ⊕ text ⊕ text-analysis ⊕ text-mining ⊕ textbooks ⊕ textmining ⊕ textual-analysis ⊕ Tim ⊕ Time-magazine ⊕ TOC ⊕ tolisten ⊕ tolook ⊕ too-much-of-a-good-thing ⊕ tools-of-change ⊕ topicmaps ⊕ toread ⊕ toreread ⊕ UMich ⊕ unbound ⊕ unconscious ⊕ university ⊕ usability ⊕ users-and-uses ⊕ via:code4libcon2008 ⊕ Virginia ⊕ virtual-browse ⊕ Voodoo-Donuts ⊕ Wallis ⊕ web-services ⊕ web4lib ⊕ webservices ⊕ weeding ⊕ West ⊕ wifi ⊕ wired ⊕ wordclouds ⊕ words ⊕ work-life-balance ⊕ work-sets ⊕ Worldcat ⊕ WorldCatlocal ⊕ writing ⊕ xISBN ⊕ XML ⊕ Zigtag_Imported_Bookmarks ⊕ Zimmer ⊕Copy this bookmark: