LexisNexis tries to keep the AI bots from plundering its valuable legal data
- The generative AI boom has sparked a race to collect data for model training.
- That’s put pressure on companies with high-quality, proprietary information.
- Relx, the owner of LexisNexis, recently warned customers about sharing its legal data with AI bots.
The generative AI boom has sparked a rush for high-quality data to train large language models to do clever things like code software and assist students in cheating on homework.
Bots are crawling the web, grabbing as much free data as they can before content creators catch on and block them. AI models can only go so far in this manner. To truly differentiate generative AI services in the future, training data may need to be proprietary or otherwise unique.
This puts pressure on businesses that own high-quality bespoke information but have been wise enough not to give it away for free on the internet. Relx is one of these companies. It owns LexisNexis, a leading provider of legal information, as well as Elsevier, a publisher of medical, technical, and scientific journals as well as other content.
A pop-up warning from Sean Fitzpatrick, CEO, North America, UK and Ireland, LexisNexis Legal and Professional, and Julie Chapman, Head of Legal, North America, recently appeared on the LexisNexis service:
“LexisNexis content use restrictions in third-party applications, including artificial intelligence technologies such as large language models and generative Al. We’d like to take this opportunity to remind you that our agreements forbid you from using or uploading content obtained through LexisNexis’ services into external applications, bots, software, or websites, including those that use artificial intelligence technologies like large language models and generative Al. We must uphold our obligation to protect the content within our services in accordance with our terms of service, and we must acknowledge that our customers share these same obligations. We continue to keep our customers at the forefront of our Al capability development. If there is a specific use case you want to investigate, please contact your relationship partner to discuss.”
It’s unclear when this alert was added or what happened to LexisNexis’ data recently to cause the CEO and legal chief to warn customers about it.
I contacted Relx and a LexisNexis spokesperson to inquire whether LexisNexis’ legal data had been scraped by bots or otherwise collected by outside companies for use in AI model training.
“Given the importance of keeping our proprietary content secure, we wanted to proactively remind customers to avoid uploading or using our content with external systems, such as large language models,” a LexisNexis spokeswoman said in an email.
New legal applications of generative AI have emerged in recent months, with AI startups like Harvey receiving millions in funding from top-tier venture capital firms like Sequoia, as previously reported by Insider.Thomson Reuters, the owner of Westlaw, a well-established legal data service that competes with LexisNexis, recently acquired Casetext for $650 million.
Relx has entered this competition. Lexis+ AI was introduced earlier this year. According to LexisNexis, this offering makes use of AI models that have been trained on the company’s exclusive legal content.