Making "Orphan Books" Available Digitally - the HathiTrust Inches Forward

Copies of so-called "orphan books" will slowly begin to make their way to digital access beginning this month, albeit in a most limited way. "Orphan books" are those that are still under copyright protection, but whose copyright holders are essentially impossible to find. Copyright law requires holder permission be obtained before a digital copy can be made available, but how can you get a copyright holder's permission if you can't locate him? It effectively bars millions of books, and the knowledge within, from public access, all to protect an owner who probably does not know or care about these rights, and suffers no loss from their digital publication. Indeed, with earlier publications, most of the original copyright holders are undoubtedly long dead.

Orphan books have been the primary issue behind the contentious Google Books Settlement litigation. During the past decade, Google began a process of scanning millions of books in university and public libraries, creating digital versions anyone with an internet connection could potentially access from anywhere in the world. Those published before 1923 are out of copyright and hence free to be published digitally or otherwise by anyone. Those published from 1923 onward are not so clear. Depending on a variety of factors, such as whether a copyright notice was printed, a copyright filed, or whether that copyright was renewed, that book may or may not currently be under copyright protection.

The most complicated class is that of books published between 1923 and 1963. There were more stringent requirements for maintaining copyrights in those days, including making a timely renewal (today, virtually any book published is protected for 70+ years even if the author makes no attempt to copyright it). A study by the Council on Library and Information Resources concluded that 55% of the 1923-1963 works are now in the public domain, or to put it another way, 45% are still under copyright protection. The CLIR also concluded that only 10% of books published in this period have copyright holders who can be reached. Indeed, half of the book publishers no longer exist, making it virtually impossible even to begin a search for copyright holders. In other words, 35% of all books published between 1923 and 1963 are "orphans." That is for books published in the U.S. They estimated 70% of non-U.S. published books from this era are "orphans." And, many post-1963 books have been orphaned as well.

When Google began making some of these orphan books available online without permission of the copyright holders, organizations representing authors and publishers sued. Google reached a settlement with them, whereby 63% of the proceeds of the sale of access would go to the copyright holders, or be held in trust for them until such time (if ever) they made themselves known. However, other authors, publishers, competitors, the government, and various other groups sued to block the settlement, saying no private agreement with representative organizations allows Google to publish copyrighted material without the actual copyright holder's permission. The Judge, in his initial ruling, appeared to essentially agree with this stance, but encouraged the parties to attempt to adjust the settlement so that it would meet with his approval.

Meanwhile, even as this battle with Google rages on, another source plans to begin making a few orphan works available digitally, though on a much smaller scale. This organization is known as the HathiTrust, and it is a consortium of around 50 institutional libraries. However, only a few plan to make orphans available at this time. The HathiTrust is an outcome of the Google digitization project. The universities have provided Google access to their books to be digitized, and in return have received digitized copies. While the universities are pleased with what Google has accomplished, they remain concerned about the long term. No one in the business or investment community is concerned about Google's long-term viability, but educational institutions think along a different time frame. They are concerned about Google 100, 200, even 1,000 years from now. Will Google still be around? Therefore, they want to be sure these digitized copies are stored somewhere forever. Indeed, "Hathi" is Hindi for "elephant," and as we all know, an elephant never forgets.

The HathiTrust has attempted to determine which books are orphans. About three months ago, it began publishing notice to holders of orphan copyrights that it would make digital copies of their books available in limited ways if they did not hear from them in 90 days. The first of these books will pass the 90-day mark this month and become available for digital publication. Among the small number of libraries that will begin to display such books are those of the Universities of Michigan, Wisconsin, Florida, California, and Duke and Cornell.

The HathiTrust accessibility to digitized books is not on nearly so a grand scale as at Google, which is why they believe they will avoid the legal problems Google has encountered. The digitized books will not be available to all institutions. Members will only be able to display books that they physically possess. Essentially, it is allowing patrons of the university to view books already in the library's collection digitally, instead of taking out the hard copy. It also has the enormously important added benefit of allowing for digital searches within the book, the ability to find terms in a fraction of a second, rather than thumbing through the entire book and still missing them.

The HathiTrust believes this more limited use, and their nature as a consortium of nonprofit institutions, will keep their program free from legal entanglements, although some objections have already been voiced. Copyright law provides for "fair use" exceptions where copying is permitted, and one of the factors in determining fair use is whether the material is being used for commercial or nonprofit educational purposes. This provision of copyright law places the HathiTrust in better standing than for-profit Google.

Access to orphan works still has a ways to go before the issues are ironed out, but one day, public access will carry the day. Knowledge can be held back for a while, but not forever. Congress could readily step in and resolve the issue, but Congress seems loathe to act on anything not backed by lobbyists and campaign contributors, so it may take some time. As John P. Wilkin, author of the CLIR report observed in his conclusion, "In nearly all cases, there is no economic harm to any person or organization in opening access to these in-copyright works, and there is a great loss in not providing access to them. Without an effective legal or policy framework that allows us to do so, a significant portion of our cultural heritage will be underused and undervalued." I hope my quoting his words fits within the boundaries of fair use!

Addendum: After this article was written, the Authors' Guild and certain similar overseas organizations and individuals filed suit to stop the HathiTrust project, including the copying and dissemination of orphan books, stating "the Universities are engaging in one of the largest copyright infringements in history." So much for this being an easy case. See the article on the Google Settlement elsewhere in this month's issue of AE Monthly for further updates on this controversy.