Late 2004, as part of the Google Books Project, Google announced that it had entered into agreements with five of the biggest research libraries in the US to digitize books from these libraries’ collections. The goal of the project was to create a digital library and expand access to library content. Google’s plan meant that the full text of the book would be scanned and added to Google’s search database.
In 2010, Google also signed several agreements with European libraries in order to digitize a large number of books. These agreements concerned only the digitization of books in the public domain. Under these agreements, both Google and the European libraries can use the digital copies of the books. The agreements contain a clause restricting the commercial use of these digital copies by others, often for up to 15 years.
Through these clauses Google has exclusive access to this dataset, which has put them in an advantageous – if not monopolistic – position to use them for the training of Artificial Intelligence (AI) models.These exclusivity clauses are potentially problematic, especially in light of EU law. In particular, the 2019 Open Data Directive (ODD) explicitly regulates these types of clauses.
The Open Future Foundation (OFF) strives to achieve a level playing field for access to data. The exclusive use by Google means that these datasets are withheld from other parties. These datasets could be of vital use for new AI developers in the development of large language models (LLMs) and other forms of generative AI. This is why the OFF approached the ILP Lab to examine the legal status of the datasets produced by the Google Books project in regard to their exclusivity clauses.
In the study, the authors provide a number of recommendations to find a fair balance between the competing interests at play. These include increasing government funding for digitisation, and have the European Commission issue guidance on the interpretation of this provision of the ODD. You can find the report here.