Does anyone know Arc Search's user agent so we can block it?
@cory I think it might be PerplexityBot. I’m not 100% certain though.
https://blog.perplexity.ai/blog/arc-x-perplexity
https://darkvisitors.com/agents/perplexitybot
@knowler thank you! Going to make sure it’s added on my end and keep an eye out for anything else they’re using.
@cory Ok, so maybe not PerplexityBot, this is the user-agent string: ArcMobile2/11 CFNetwork/1492.0.1 Darwin/23.3.0
@knowler got it! So we'd need to block `ArcMobile2`? I haven't dug into this myself but can't imagine the entire string is required.
@cory It also doesn’t seem like they’re respecting robots.txt
@knowler oh lord — I wish I were surprised by that
@cory Why? I feel like Im out of the loop here.
@austincnunn it’s an AI powered engine that aggregates information for you and provides suspect answers — I really don’t see any benefit to allowing it.
@cory Thats fair. Just confused when I saw that and thought I had missed a big hubballoo.
@austincnunn ah yeah it's yet another misguided attempt to layer AI into absolutely everything
@austincnunn use AI to remove AI from your codebase
@cory AI companies: "Wait, no."
I have these in one of my vhost access logs:
"Arc/1.19.1 (Mac OS X Version 14.0 (Build 23A5337a))"
"Arc/1.26.2 (Mac OS X Version 14.2.1 (Build 23C71))"
"Arc/1.25.1 (Mac OS X Version 14.3 (Build 23D5051b))"
"ARC Reader (http://arc.semsol.org/)"
Hope this helps!
@bahua thank you! My site's on Netlify and I don't have any insight into access logs from any visitors.
@cory Try to make sure you don’t block Arc in it’s entirety. Many of us Arc users are not exactly fans of the LLM stuff they’ve added lately (which thankfully is opt-in on the Mac version, I never turned any of them on).
@torb I sure won’t! I’m not interested in blocking visitors or browsers, just robots and scrapers (provide the honor robots.txt).
@cory @torb another reason why the robots solution for these LLMs is not so good, a meta tag would be better: https://noml.info/