AI is causing panic for authors. Now the courts are involved

‘Everybody’s realizing to what extent their data, their information, their creativity, has been absorbed.’

When novelist Douglas Preston first began tinkering with ChatGPT, he challenged the AI software to write an original poem based on a character from one of his books.

“It came out with this terrific poem written in iambic pentameter,” Preston said. The end result was both impressive and concerning. “What really surprised me was how much it knew about this character; way more than it possibly could have gleaned from the internet,” Preston went on to say.

The adventure writer suspected that the chatbot had absorbed his work, most likely as part of the training process, in which an artificial intelligence model consumes a large amount of data before synthesizing it into seemingly original content.

“That was a very disturbing feeling,” Preston told me, “not unlike coming home and finding that someone’s been in your house and taken things.”

Preston’s concerns prompted him to join a proposed class action lawsuit accusing OpenAI, the developer of ChatGPT and a major player in the growing AI industry, of copyright infringement. (Recently, OpenAI sought a valuation of $80 billion to $90 billion.)

Preston is joined in the suit by a slew of other big-name authors, including John Grisham, Jonathan Franzen, Jodi Picoult, and George R.R. Martin — the notoriously slow-to-publish “Game of Thrones” author who, Preston claims, joined the suit out of frustration that fans were preemptively generating the final book in his series using ChatGPT.

OpenAI, for its part, has argued that training an AI system falls under fair use protections, particularly given how much AI transforms the underlying training data into something new. A spokesperson for OpenAI told The Times in an emailed statement that the company respects authors’ rights and believes they should “benefit from AI technology.”

“We’re having productive conversations with many creators around the world, including the Authors Guild, and have been working cooperatively to understand and discuss their concerns about AI,” a spokesperson for Google said. “We’re optimistic we will continue to find mutually beneficial ways to work together to help people utilize new technology in a rich content ecosystem.”

Nonetheless, the publishing industry is fighting back in the face of a software boom that has given anyone with WiFi the ability to automatically generate large amounts of text. In addition to Preston’s suit, several other groups of authors have proposed class action lawsuits against OpenAI.

“Everyone is realizing the extent to which their data, information, and creativity have been absorbed,” said Ed Nawotka, an editor at Publishers Weekly. In the industry, there is a sense of “abject panic.”

Sarah Silverman accused OpenAI and Meta — Facebook’s parent company and a major AI developer itself — of copyright infringement in a recent pair of lawsuits. Since then, the two companies have worked to have the majority of Silverman’s cases dismissed.

In a separate suit, Paul Tremblay (“The Cabin at the End of the World”) and Mona Awad (“Bunny”) sued OpenAI for copyright violations — the company is also attempting to get that one mostly dismissed — and Michael Chabon (“The Yiddish Policemen’s Union”) is a plaintiff in two additional legal actions that target OpenAI and Meta, respectively.

In July, the Authors Guild — a professional trade organization, not a labor union — sent an open letter to several tech companies demanding consent, credit, and fair compensation when writers’ works are used to train AI models. Margaret Atwood, Dan Brown, James Patterson, Suzanne Collins, Roxane Gay, and Celeste Ng were among those who signed.

All of this comes on top of the nearly 5-month-long strike by Hollywood screenwriters, which resulted in, among other things, new regulations on the use of AI for script generation. (A separate strike, which is still going on, has gathered screen actors around their own AI concerns.)

The lawsuit in which Preston is involved, which also includes the Authors Guild, claims that OpenAI copied the authors’ works “without permission or consideration” in order to train AI programs that now compete for readers’ time and money.

The lawsuit also criticizes ChatGPT for creating derivative works, or “material that is based on, mimics, summarizes, or paraphrases [the] Plaintiffs’ works and harms the market for them.”

On behalf of American fiction authors whose copyrighted works were used to train OpenAI software, the plaintiffs are seeking damages for lost licensing opportunities and “market usurpation,” as well as an injunction against future such practices.

“They didn’t ask for our permission, and they aren’t compensating us,” Preston said of OpenAI. “They’ve created a very valuable commercial product that can reproduce our voices….” It’s essentially large-scale theft of our creative work.”

He added that because the plaintiffs’ books aren’t freely available on the open web, OpenAI “almost certainly” obtained them through alleged piracy sites like the file-sharing platform LibGen. (This suspicion is reiterated in the suit, attributed to “independent AI researchers.”)

OpenAI refuses to answer whether the plaintiffs’ books were included in ChatGPT’s training data or obtained through file-sharing sites such as LibGen. OpenAI stated in a statement to the United States Patent and Trademark Office, which was cited in the Authors Guild suit, that modern AI systems are sometimes trained on publicly available data sets that include copyrighted works.

Meanwhile, the Atlantic reports that Meta trained its ChatGPT competitor LLaMA on a corpus of pirated ebooks known as “Books3.” According to a searchable version of that data set, LLaMA fed on books written by nearly all of the individuals named as plaintiffs in the various aforementioned lawsuits.

The works of L.A. Times employees were also included. Meta did not respond to The Times’ request for comment on how LLaMA was trained.

Aside from the specific sources of the training data, many authors are concerned about where this technology will lead their industry.

Another plaintiff in the Authors Guild lawsuit, Michael Connelly, author of the Harry Bosch series of crime novels, framed those concerns as a matter of control: “control of your own work, control of your own property.”

Connelly said he never got to choose whether his books would be used to train an AI, but if he had, even if there was money on the table, he would have declined. ChatGPT’s idea of writing an unofficial Bosch sequel offends him; even when Amazon adapted the series into a TV show, he claims he had some control over the scripts and casting.

“These characters belong to us,” Connelly explained. “They come from our minds. I even included language in my will about how no other author will be able to carry the Harry Bosch torch after I’m gone. I don’t want anyone else telling his story because he’s mine. I don’t want it told to me by a machine.”

The question is whether the law will allow the machines to do so.

The various lawsuits filed against OpenAI allege violations of intellectual property rights. However, copyright law — particularly fair use, which governs when copyrighted work can be incorporated into other endeavors, such as education or criticism — still does not provide a clear answer as to how these lawsuits will play out.

“We’ve got a kind of push and pull right now in the case law,” said intellectual property attorney Lance Koonce, a partner at the law firm Klaris, citing two recent Supreme Court cases that offer competing models of fair use.

The court ruled in Authors Guild vs. Google that Google was allowed to digitize millions of copyrighted books in order to make them searchable. In Andy Warhol Foundation for the Visual Arts Inc. vs. Goldsmith, the court determined that the incorporation of a photographer’s work into the titular pop artist’s art did not fall under fair use because Warhol’s art was commercial and served the same basic purpose as the original photo.

“These AI cases — and especially the Authors Guild case (against OpenAI) — fall into that tension,” he said.

According to OpenAI’s patent office statement, training artificial intelligence software on copyrighted works “should not, by itself, harm the market for or value of copyrighted works” because the works are consumed by software rather than real people.

Stakeholders are already pitching solutions to this tension outside of legal channels.

Suman Kanuganti, CEO of AI messaging platform Personal.ai, believes the tech industry will adopt some sort of attribution standard that will allow people who contribute to an AI’s training data to be identified and compensated.

“Once you build the models with known, authenticated data units, then technologically, it’s not a challenge,” Kanuganti went on to say. “And once you solve that problem … the economic association then becomes easier.”

Preston, the adventure novelist, agreed that there might still be hope.

He stated that licensing books to software developers through a centralized clearing house could provide authors with a new revenue stream while also securing high-quality training data for AI companies, and that the Authors Guild attempted to set up such an arrangement with OpenAI at one point but were unable to reach an agreement. (OpenAI declined to comment on such discussions.)

“We were trying to get them to sit down with us in good faith; we’re not opposed at all to AI,” Preston went on to say. “It’s not a zero-sum game.”


Similar Posts

Leave a Reply