Reddit Files Lawsuit Against AI Firm Perplexity and Data Scrapers Over Alleged Copyright Infringement

Major Copyright Lawsuit Filed Against AI Companies

Reddit has initiated legal proceedings against artificial intelligence search developer Perplexity and several data collection firms, according to court documents filed this week. The lawsuit, filed in the US District Court for the Southern District of New York, alleges systematic copyright infringement through unauthorized data scraping activities.

Major Copyright Lawsuit Filed Against AI Companies
Alleged “Bank Robber” Tactics Described
Connection to AI Training Data Market
Broader Legal Context for AI Copyright
Industry Implications

Sources indicate that three data firms—Oxylabs UAB, AWMProxy and SerpApi—are named as defendants alongside Perplexity. The legal action represents the latest escalation in growing tensions between content platforms and AI companies seeking training data.

Alleged “Bank Robber” Tactics Described

In what analysts suggest is particularly strong language for legal filings, Reddit compared the defendants’ alleged methods to “would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead.” The report states this metaphor describes how the data firms allegedly circumvented both Reddit and Google’s technological protections.

According to the lawsuit, the companies reportedly used techniques to mask their identities and locations while accessing nearly three billion search engine result pages during a concentrated two-week period in July. This scale of data extraction, the filing claims, represents a deliberate effort to avoid detection while harvesting valuable content.

Connection to AI Training Data Market

Reddit reportedly traced the scraped data back to Perplexity, leading to a prior cease-and-desist letter. Industry observers note that Perplexity remains listed as a customer of defendant SerpApi on its website, alongside other technology giants including Meta, Samsung and Nvidia.

The platform’s immense user-generated content library—reporting over 110 million daily active users and more than 22 billion posts and comments—has made it a particularly valuable resource for AI training. As the report states, this human-created conversation data is precisely what AI companies seek to refine their models.

Broader Legal Context for AI Copyright

This lawsuit joins a growing landscape of legal challenges surrounding AI and copyright. According to industry analysts, copyright represents one of the most contentious legal issues facing AI development, as companies require massive quantities of human-generated content for training purposes.

While some AI firms have negotiated multimillion-dollar licensing agreements with content providers—Reddit itself has reportedly struck deals with both OpenAI and Google—others maintain that their use of copyrighted material qualifies as fair use. Recent court decisions have seen mixed results, with sources indicating that both Meta and Anthropic secured fair use victories earlier this summer.

Perplexity faces additional legal challenges, having been recently sued for copyright infringement by Encyclopedia Britannica, which owns Merriam-Webster dictionary. The company did not immediately respond to requests for comment regarding the Reddit lawsuit, according to reports.

Industry Implications

The outcome of this case could have significant implications for how AI companies source training data moving forward. As the legal landscape evolves, analysts suggest we may see increased pressure on AI firms to establish transparent data sourcing practices and consider licensing agreements rather than relying on web scraping.

This lawsuit represents Reddit’s latest effort to control and monetize its vast content repository, following its previous legal action against Anthropic for alleged data misuse. The resolution of these cases will likely help define the boundaries of fair use in the age of artificial intelligence.

References

This article aggregates information from publicly available sources. All trademarks and copyrights belong to their respective owners.

Note: Featured image is for illustrative purposes only and does not represent any specific product, service, or entity mentioned in this article.

Oracle Corporation has revived its dual-CEO leadership structure as the technology giant accelerates its artificial intelligence transformation. This unconventional approach comes as the company expands its cloud computing infrastructure business, which reportedly includes OpenAI among its enterprise customers.

Oracle’s Return to Dual Leadership Structure

Technology giant Oracle Corporation has reportedly returned to a dual-chief executive officer system, defying conventional corporate governance wisdom, according to recent reports. This marks the second time in the company’s history that it has implemented this unconventional leadership approach as it navigates the rapidly evolving artificial intelligence landscape.