Google won an important victory today in its long-standing legal battle with some publishers and authors over the search giant’s book-scanning project.
Judge Denny Chin of the U.S. District Court, Southern District of New York granted Google’s motion for summary judgment dismissing a class-action lawsuit brought by the Author’s Guild, among others, including several authors, one of whom is former Yankees’ pitcher Jim Bouton, author of “Ball Four.”
The plaintiffs, who say they will appeal the decision, argued that Google infringed upon their copyrights by scanning these books without their permission.
“Google made unauthorized digital editions of nearly all of the world’s valuable, copyright-protected literature and profits from displaying those works,” Author’s Guild Executive Director Paul Aiken told ars technica. “In our view, such mass digitization and exploitation far exceeds the bounds of the fair use defense.”
Google has not only scanned the books, which now number over 20 million titles, according to court documents, but has converted the scanned copies into searchable documents, including “snippets” that are displayed under the company’s assertion of “fair use.”
Judge: Five Benefits
It was this interpretation of fair use that the court affirmed in today’s decision. In his opinion, Chin listed five main benefits of Google’s “library project.”
First, Google Books provides a new and efficient way for readers and researchers to find books. It makes tens of millions of books searchable by words and phrases. It provides a searchable index linking each word in any book to all books in which that word appears. … Indeed, Google Books has become such an important tool for researchers and librarians that it has been integrated into the educational system — it is taught as part of the information literacy curriculum to students at all levels.
Second, in addition to being an important reference tool, Google Books greatly promotes a type of research referred to as “data mining” or “text mining.” Google Books permits humanities scholars to analyze massive amounts of data — the literary record created by a collection of tens of millions of books. Researchers can examine word frequencies, syntactic patterns, and thematic markers to consider how literary style has changed over time…. The ability to determine how often different words or phrases appear in books at different times “can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology.”
Third, Google Books expands access to books. In particular, traditionally underserved populations will benefit as they gain knowledge of and access to far more books. Google Books provides print-disabled individuals with the potential to search for books and read them in a format that is compatible with text enlargement software, text-to-speech screen access software, and Braille devices. Digitization facilitates the conversion of books to audio and tactile formats, increasing access for individuals with disabilities.
Fourth, Google Books helps to preserve books and give them new life. Older books, many of which are out-of-print books that are falling apart buried in library stacks, are being scanned and saved.
Finally, by helping readers and researchers identify books, Google Books benefits authors and publishers. When a user clicks on a search result and is directed to an “About the Book” page, the page will offer links to sellers of the book and/or libraries listing the book as part of their collections. The About the Book page for “Ball Four,” for example, provides links to Amazon.com, Barnes&Noble.com, Books-A-Million, and IndieBound.
How Google Utilizes “Snippets”
In the ruling, Chin provides an extended explanation for how Google’s book-scanning process works. This includes a detailed look at how Google provides numerous “snippets” from a book while preventing an “attacker” from assembling a complete book from the snippets:
In scanning books…Google uses optical character recognition technology to generate machine-readable text, compiling a digital copy of each book. Google analyzes each scan and creates an overall index of all scanned books. The index links each word or phrase appearing in each book with all of the locations in all of the books in which that word or phrase is found.
The index allows a search for a particular word or phrase to return a result that includes the most relevant books in which the word or phrase is found. Because the full texts of books are digitized, a user can search the full text of all the books in the Google Books corpus.
Users of Google’s search engine may conduct searches, using queries of their own design. In response to inquiries, Google returns a list of books in which the search term appears. A user can click on a particular result to be directed to an “About the Book” page, which will provide the user with information about the book in question. The page includes links to sellers of the books and/or libraries that list the book as part of their collections. No advertisements have ever appeared on any About the Book page that is part of the Library Project.
For books in “snippet view” (in contrast to “full view” books), Google divides each page into eighths — each of which is a “snippet,” a verbatim excerpt. Each search generates three snippets, but by performing multiple searches using different search terms, a single user may view far more than three snippets, as different searches can return different snippets.
For example, by making a series of consecutive, slightly different searches of the book “Ball Four,” a single user can view many different snippets from the book. Google takes security measures to prevent users from viewing a complete copy of a snippet-view book.
(A) user cannot cause the system to return different sets of snippets for the same search query; the position of each snippet is fixed within the page and does not “slide” around the search term; only the first responsive snippet available on any given page will be for the same search query; the position of each snippet is fixed within the page and does not “slide” around the search term; only the first responsive snippet available on any given page will be returned in response to a query; one of the snippets on each page is “black-listed,” meaning it will not be shown; and at least one out of ten entire pages in each book is black-listed.
An “attacker” who tries to obtain an entire book by using a physical copy of the book to string together words appearing in successive passages would be able to obtain at best a patchwork of snippets that would be missing at least one snippet from every page and 10% of all pages. In addition, works with text organized in short “chunks,” such as dictionaries, cookbooks, and books of haiku, are excluded from snippet view.