AI startup Perplexity is crawling and scraping content material from web sites which have explicitly indicated they don’t wish to be scraped, in line with web infrastructure supplier Cloudflare.
On Monday, Cloudflare printed analysis saying it noticed the AI startup ignore blocks and conceal its crawling and scraping actions. The community infrastructure big accused Perplexity of obscuring its id when making an attempt to scrape net pages “in an try to bypass the web site’s preferences,” Cloudflare’s researchers wrote.
AI merchandise like these provided by Perplexity depend on gobbling up giant quantities of information from the web, and AI startups have lengthy scraped textual content, photographs, and movies from the web many occasions with out permission to make their merchandise work. In current occasions, web sites have tried to combat again by utilizing the net normal Robots.txt file, which tells search engines like google and AI corporations which pages will be listed and which shouldn’t, efforts which have seen blended outcomes up to now.
Perplexity seems to be willingly circumventing these blocks by altering its bots’ “person agent,” that means a sign that identifies a web site customer by their machine and model kind, in addition to altering their autonomous system networks, or ASN, primarily a quantity that identifies giant networks on the web, in line with Cloudflare.
“This exercise was noticed throughout tens of hundreds of domains and thousands and thousands of requests per day. We have been in a position to fingerprint this crawler utilizing a mixture of machine studying and community indicators,” learn Cloudflare’s put up.
Perplexity spokesperson Jesse Dwyer dismissed Cloudflare’s weblog put up as a “gross sales pitch,” including in an e-mail to TechCrunch that the screenshots within the put up “present that no content material was accessed.” In a follow-up e-mail, Dwyer claimed the bot named within the Cloudflare weblog “isn’t even ours.”
Cloudflare mentioned it first observed the habits after its prospects complained that Perplexity was crawling and scraping their websites, even after they added guidelines on their Robots file and for particularly blocking Perplexity’s recognized bots. Cloudflare mentioned it then carried out assessments to examine and confirmed that Perplexity was circumventing these blocks.
Techcrunch occasion
San Francisco
|
October 27-29, 2025
“We noticed that Perplexity makes use of not solely their declared user-agent, but in addition a generic browser meant to impersonate Google Chrome on macOS when their declared crawler was blocked,” in line with Cloudflare.
The corporate additionally mentioned that it has de-listed Perplexity’s bots from its verified record and added new methods to dam them.
Cloudflare has not too long ago taken a public stance towards AI crawlers. Final month, Cloudflare introduced the launch of a market permitting web site homeowners and publishers to cost AI scrapers who go to their websites. Cloudflare’s chief government Matthew Prince sounded the alarm on the time, saying AI is breaking the enterprise mannequin of the web, notably publishers. Final yr, Cloudflare additionally launched a free instrument to stop bots from scraping web sites to coach AI.
This isn’t the primary time Perplexity is accused of scraping with out authorization.
Final yr, information retailers, corresponding to Wired, alleged Perplexity was plagiarizing their content material. Weeks later, Perplexity’s CEO Aravind Srinivas was unable to instantly reply when requested to supply the corporate’s definition of plagiarism throughout an interview with TechCrunch’s Devin Coldewey on the Disrupt 2024 convention.

