How generative AI’s uncomfortable relationship with copyright law will determine the future of the industry

  • Generative-AI startups rely heavily on data scraping to train their sophisticated models.
  • But artists and other rights holders have launched a slew of copyright lawsuits over the practice.
  • The outcome of these legal battles will determine the future of the burgeoning sector, investors say.

Eva Toorenent’s artwork has been stolen before, but using it to train an AI model felt like a “new kind of violating.”

As previously reported by Insider, the artist and illustrator, who has worked as a freelancer since 2019, discovered last year that another artist had used her work to enable the text-to-image program Midjourney to produce art in her style, some of which was sold to an art gallery.

Toorenent’s story has become increasingly common since the advent of generative AI, as artists, developers, and writers struggle to protect their work. Many artists are turning to the courts for assistance.

Rights holders argue that AI using their work without a license is a “unauthorized derivative work” and thus an infringement of copyright law. Meanwhile, AI startups insist that their models follow fair-use doctrine, which allows them to use the works of others.

Universal Music Group sued the AI startup Athropic earlier this month for distributing copyrighted lyrics. Artists claimed in January that Midjourney and Stability AI, the startup behind the image generator Stable Diffusion, had scraped their work without their permission. Meanwhile, Getty Images is in court with Stability AI over the use of its library in training Stable Diffusion.

The outcomes of these legal battles will almost certainly have a huge impact on generative-AI startups, which have been one of the few bright spots in what has been a dismal year for tech as venture capital funding continues to fall from its 2021 high. According to Dealroom data, generative-AI startups have raised $18 billion in funding this year.According to a Harvard Business Review article, if courts rule in favor of artists, startups will most likely face “substantial infringement penalties.” Investors believe that a data market will emerge in the long run.

The free-for-all of data scraping will come to an end.

AI models are trained on data scraped from the web, raising concerns about whether original data sources should be credited or used without consent in the first place.

MMC Ventures partner Simon Menashy describes the current model as a “Wild West with few licenses and little regulation.”

When ChatGPT-3 was released in 2022, the world was unprepared, and few systems or processes for the fair and ethical exchange of data had been put in place, he said.

“We’re going to see the shutters coming down” on data scraping, Menashy predicts. He believes that future regulations will explicitly prohibit the scraping of AI data.

The worst-case scenario, according to Ekaterina Almasque, a partner at the VC fund OpenOcean, is that no rulings emerge from the ongoing legal battles and things continue as they are.

She mentioned that the majority of AI models come from very large corporations. “The same way they don’t pay tax in many places, they wouldn’t pay for using such a valuable resource as data,” she went on to say.


Almasque hopes that the court cases will spark the development of a functioning data market in which data is bought, sold, and licensed in a fair and equitable manner.

Getty claims in its lawsuit against Stability AI that the AI startup’s Stable Diffusion program “copied 12 million images to train its AI model without permission.”

A win for Stability would set a “dangerous precedent” by signaling that “everything on the internet is up for grabs to train large language models,” according to Sunny Dhillon, managing partner at Kyber Knight Capital.

Getty, which recently announced a collaboration with Nvidia to develop its own generative-AI tool for photo generation, claims that “the explicit consent of rights holders is required to use their data to train learning models.”

“Generative-AI tools and services should be transparent as to the data that is used for training and the outputs of these models,” Getty said in a statement.

AI’s Specialized Future

Startups that build more specialized models with licensed data may be well positioned to thrive if data scraping restrictions are imposed. Models for the legal and healthcare industries have been developed by companies such as the Sequoia-backed Harvey and the Andreessen Horowitz portfolio startup Hippocratic AI.

According to Menashy of MMC Ventures, a distinction will emerge between AI companies that license their data and the rest of the pool.

“That’s interesting to startups — there’s an opportunity for them to have a differentiated product,” she said. “They can train models on data that’s not universally available to customers, and tell them it’s licensed and compliant.”

According to two investors, in a regulated market, a plethora of data sources will emerge from non-AI companies that have collected data for their own operations but choose to license it to firms building vertical specialties.

Climate Aligned is one startup that is using publicly available data to develop a specialized generative-AI tool. The company, which recently raised $1.8 million in seed funding, uses artificial intelligence to highlight the environmental, social, and governance (ESG) credentials of financial products and issuers.

“We use public disclosure, which is available on the internet — it’s companies’ websites, their annual reports, and things like that,” said Climate Aligned cofounder and CEO Aleksi Tukiainen.

“Then, when we provide information via our platform, we point to the source documentation.” We’re not training models with random data from somewhere or making it up on the fly.”

AI regulation may differ between continents.

“Europe prefers to regulate things first and has many rules.” “The United States is frequently the polar opposite, only regulating when there is a large size or issue,” Menashy explained.

According to Andre Retterath, a partner at Earlybird Venture Capital, AI regulation could follow suit.

“Europe went first with GDPR, and the United States followed with CCPA, two independent regulatory frameworks that differ in details but pursue the same overarching goals.” “I anticipate something similar for next-generation AI,” he said.

According to Menashy, the industry is waiting for its “Taylor Swift moment.”

Despite having no licensing control over her music catalog, the megastar famously reclaimed control of it. Swift wrote the songs on her first six albums, granting her “synchronization license” and the ability to rerecord those albums without violating copyright laws, regaining control of her music.

“Who’s going to be the Taylor Swift of generative AI?” Menashy inquired.

Similar Posts

Leave a Reply