Translated by Vasiliki Katsouli, Forensic Psychology student
The biggest AI companies don’t mind paying to use copyrighted material as training data, and here’s why.
The US Copyright Office is accepting public comments on potential new rules for artificial intelligence (AI) use of copyrighted material, and the world’s biggest AI companies had a lot to say. Below are the arguments from Meta, Google, Microsoft, Adobe, Hugging Face, StabilityAI and Anthropic, as well as a response from Apple, which focused on copyrighting AI-written code.
There are some differences in their approaches, but the overall message for most is the same: they don’t believe they should pay to train AI models on copyrighted works.
The Copyright Office opened a comment period on August 30, with a deadline of October 18 for written comments, on changes it is considering regarding the use of copyrighted data to train AI models, whether the material generated by artificial intelligence can be copyrighted without human involvement and responsibility for those copyrights.
There has been no shortage of copyright lawsuits over the past year, with artists, writers, developers and companies alleging infringement in different cases.
Here are some excerpts from each company’s response.
Meta: Copyright holders wouldn’t get much money anyway.
Enforcing a first-of-its-kind licensing regime now, much later, will cause chaos as developers try to track down millions and millions of copyright holders, with very little benefit as any fair remuneration will be incredibly small, due of the lack of importance of each project in an AI training set.
Google: Training AI is like reading a book
If education could be achieved without making copies, there would be no copyright issues. Indeed, this act of “gathering knowledge,” like the act of reading a book and learning the facts and ideas in it, would not only not be illegal, but would advance the very purpose of copyright law. The mere fact that, technologically speaking, copies must be made to extract these ideas and facts from the copyrighted works should not alter this result.
Microsoft: Copyright law change could hurt small AI developers
Any requirement to obtain consent to use accessible works for education will stifle AI innovation. It is not feasible to achieve the scale of data necessary to develop responsible AI models, even when the identity of a project and its owner are known. Such licensing systems will also stifle innovation by startups and new entrants who lack the resources to obtain licenses, leaving AI development to a small set of companies with the resources to conduct large-scale licensing programs or to developers in countries , who have ruled that using copyrighted works to train AI models is not infringement.
Anthropogenic: The current law is fine, don’t change it
“Sound policy has always recognized the need for appropriate limits on copyright in order to support creativity, innovation and other values, and we believe that existing legislation and continued cooperation between all stakeholders can harmonize the diverse interests at stake, unlocking the benefits of artificial intelligence while addressing the concerns.”
Adobe: It’s fair use, like when Accolade copied Sega’s code
In Sega v. Accolade, it was held that the intermediate copying of Sega’s software was fair use. The defendant created copies, while reverse engineering, to discover the functional requirements – non-proprietary information – for making games compatible with Sega’s game console. Said mid-copying also benefited the public: it led to an increase in the number of independently designed video games (which contain a mix of functional and creative aspects) available for Sega’s console. This increase in creative expression was exactly what copyright law was intended to promote.
Anthropogenic: Copying is just an intermediate step
As mentioned above, the training process creates copies of the information for the purposes of performing a statistical analysis of the data. Copying is just an intermediate step, extracting unprotected data about the entire set of projects in order to create new outputs. In this way, the use of the original copyrighted work is non-expressive – that is, the copyrighted expression is not reused to communicate to users.
Andreessen Horowitz: Investors have spent ‘billions and billions’
Over the last decade or more, there has been a huge amount of investment – billions and billions of dollars – in the development of artificial intelligence technologies, provided that, under current copyright law, any copying necessary to derive statistics is permitted . Changing this regime will significantly disrupt established expectations in this area. These expectations have been a critical factor in massive private equity investment in US-based AI companies, which in turn have made the US a world leader in AI. Undermining these expectations would jeopardize future investment, as well as U.S. economic competitiveness and national security.
Hugging Face: Training on copyrighted material is fair use
The use of a specific project in education has a more broadly beneficial purpose: the creation of a distinctive and productive AI model. Rather than replacing the specific communicative expression of the original work, the model is capable of generating a wide variety of different kinds of results that are completely unrelated to the underlying copyrighted expression in question. For these and other reasons, generative AI models are generally fair use when trained on a large number of copyrighted works. However, the term “generally” is used deliberately, as one can imagine patterns of events that would cause more difficult decisions.
Stability AI: Other countries qualify the training of artificial intelligence models as fair use
A number of jurisdictions, including Singapore, Japan, the European Union, the Republic of Korea, Taiwan, Malaysia and Israel, have reformed their intellectual property laws to create safe havens for AI training that achieve similar fair use results. In the UK, the government’s chief scientific adviser recommended that “if the government’s aim is to promote an innovative AI industry in the UK, it should enable the mining of available data, text and images (the introduction) and leverage [sic] the existing protections of copyright and intellectual property rights law for the export of AI”.
Apple: Let us copyright the code we built with artificial intelligence
In cases where a human programmer controls the expressive elements of the output and decisions to modify, add, improve, or even reject the proposed code, the final code resulting from the programmer’s interaction with the tools will have sufficient human authorship to to be the subject of intellectual property.
The volunteer team of the CSI Institute, consisting of specialized scientists such as psychologists, criminologists, sociologists as well as network & IT technicians, is close to you providing information, information and knowledge through a variety of article topics.