Uncategorized

Perplexity AI Is Lying About Their User Agent

Robb Knight:

I put up a post about blocking AI bots after the block was in
place, so assuming the user agents are sent, there’s no way
Perplexity should be able to access my site. So I asked:

What is this post about
https://rknight.me/blog/blocking-bots-with-nginx/

I got a perfect summary of the post including various details that
they couldn’t have just guessed. Read the full response
here. So what the fuck are they doing?

I checked a few sites and this is just Google Chrome running on
Windows 10. So they’re using headless browsers to scrape content,
ignoring robots.txt, and not sending their user agent string. I
can’t even block their IP ranges because it appears these headless
browsers are not on their IP ranges.

Terrific, succinct write-up documenting that Perplexity has clearly been reading and indexing web pages that it is forbidden, by site owner policy, from reading and indexing — all contrary to Perplexity’s own documentation and public statements.

 ★ 

Robb Knight:

I put up a post about blocking AI bots after the block was in
place, so assuming the user agents are sent, there’s no way
Perplexity should be able to access my site. So I asked:

What is this post about
https://rknight.me/blog/blocking-bots-with-nginx/

I got a perfect summary of the post including various details that
they couldn’t have just guessed. Read the full response
here
. So what the fuck are they doing?

I checked a few sites and this is just Google Chrome running on
Windows 10. So they’re using headless browsers to scrape content,
ignoring robots.txt, and not sending their user agent string. I
can’t even block their IP ranges because it appears these headless
browsers are not on their IP ranges.

Terrific, succinct write-up documenting that Perplexity has clearly been reading and indexing web pages that it is forbidden, by site owner policy, from reading and indexing — all contrary to Perplexity’s own documentation and public statements.

Read More 

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top
Generated by Feedzy