Perplexity AI Is Lying About Their User Agent
Robb Knight:
I put up a post about blocking AI bots after the block was in
place, so assuming the user agents are sent, there’s no way
Perplexity should be able to access my site. So I asked:
What is this post about
https://rknight.me/blog/blocking-bots-with-nginx/
I got a perfect summary of the post including various details that
they couldn’t have just guessed. Read the full response
here. So what the fuck are they doing?
I checked a few sites and this is just Google Chrome running on
Windows 10. So they’re using headless browsers to scrape content,
ignoring robots.txt, and not sending their user agent string. I
can’t even block their IP ranges because it appears these headless
browsers are not on their IP ranges.
Terrific, succinct write-up documenting that Perplexity has clearly been reading and indexing web pages that it is forbidden, by site owner policy, from reading and indexing — all contrary to Perplexity’s own documentation and public statements.
★
Robb Knight:
I put up a post about blocking AI bots after the block was in
place, so assuming the user agents are sent, there’s no way
Perplexity should be able to access my site. So I asked:
What is this post about
https://rknight.me/blog/blocking-bots-with-nginx/
I got a perfect summary of the post including various details that
they couldn’t have just guessed. Read the full response
here. So what the fuck are they doing?
I checked a few sites and this is just Google Chrome running on
Windows 10. So they’re using headless browsers to scrape content,
ignoring robots.txt, and not sending their user agent string. I
can’t even block their IP ranges because it appears these headless
browsers are not on their IP ranges.
Terrific, succinct write-up documenting that Perplexity has clearly been reading and indexing web pages that it is forbidden, by site owner policy, from reading and indexing — all contrary to Perplexity’s own documentation and public statements.