Reddit Alleges AI Startup Perplexity Circumvented Data Access Restrictions Using Scraped Information

The Digital “Mountweazel” Trap

In a legal filing that echoes copyright protection tactics from the print era, Reddit has accused artificial intelligence company Perplexity of accessing and using specially planted test content, according to reports covered by The New York Times. The forum-based social media platform reportedly created a “test post” that was configured to be accessible only through Google’s search engine and not otherwise available on the internet, sources indicate.

The Digital “Mountweazel” Trap
AI Data Scraping Under Legal Scrutiny
Expanded Legal Action Against Data Scrapers
Escalating Data Access Dispute

This digital trap bears striking resemblance to “mountweazels” – fabricated entries deliberately inserted into reference works to catch plagiarists. The technique dates back to at least 2001 when the New Oxford American Dictionary included the fake word “esquivalience,” defined as the “willful avoidance of one’s official responsibilities,” to identify competitors who might be copying its content without authorization., according to industry analysis

AI Data Scraping Under Legal Scrutiny

The lawsuit represents the latest legal challenge to the AI industry’s widespread practice of training models on scraped internet data, much of which is copyrighted material. According to the report, Reddit lawyers argued that “Perplexity’s business model is effectively to take Reddit’s content from Google search results,” then process it through AI models and “call it a new product.”

Analysts suggest this case highlights the growing tension between AI companies hungry for training data and content creators seeking to control and monetize their digital assets. Reddit itself has been positioning to capitalize on the AI data demand by restricting scraper access and pursuing premium data licensing agreements, with expectations of generating over $200 million in the coming years through such ventures.

Expanded Legal Action Against Data Scrapers

Beyond Perplexity, the legal action targets three additional data scraping firms: Texas-based SerpApi, Lithuanian startup Oxylabs, and Russia’s AWMProxy, which has reportedly been associated with the notorious Glupteba malware botnet. These companies originally built their businesses by scraping Google search data to provide search engine optimization services to corporate clients.

The New York Times explains that historically, this data scraping created a mutually beneficial ecosystem since search results typically directed traffic back to the original websites. However, the relationship became increasingly one-sided as AI chatbots trained on these datasets often fail to provide meaningful attribution or traffic to content sources.

Escalating Data Access Dispute

Reddit’s lawsuit claims that Perplexity circumvented earlier enforcement actions by purchasing scraped datasets from these third-party companies after receiving a cease and desist order for directly scraping Reddit content without payment. The legal filing notes that citations to Reddit data in Perplexity’s AI search results had increased “fortyfold,” according to the report.

This legal confrontation emerges as Reddit develops its own integrated AI capabilities while simultaneously seeking to monetize its vast repository of user-generated content. The case underscores the complex data ownership questions emerging throughout the AI industry as companies balance innovation against intellectual property rights in the digital age.