What AI crawler readiness means
AI crawlers are not browsers. They request a URL, read the HTML that comes back, and parse the text. Each engine relies on one or more named bots: GPTBot feeds OpenAI and ChatGPT, ClaudeBot feeds Anthropic and Claude, PerplexityBot feeds Perplexity, and Google-Extended is the opt signal for Google's AI features. When one of those bots is blocked or handed a challenge page, the engine behind it indexes nothing from your site.
Readiness is the binary question underneath all of that: does your homepage return your actual content to each of these user agents, or does it return a 403, a CAPTCHA, or an empty shell?
Why it is the floor, not the ceiling
Crawlability is necessary but not sufficient. A perfectly readable site with no third-party mentions still gets passed over, because citations also depend on the sources an engine already trusts. That second layer is the subject of the AI search visibility pillar.
But the order matters: if the bot cannot read the page, nothing downstream helps. In our Invisible 10 study, one of the ten vendors blocked our discovery user agent at the homepage outright. The other nine each had detectable homepage fixes standing between them and being indexed at all. The technical floor is closer to a CDN-rule change than a content roadmap, which is why it is the cheapest place to start.
The defaults that block crawlers by accident
Almost none of the blocks we see are deliberate. They are defaults someone enabled for a different reason and forgot.
- Cloudflare Bot Fight Mode turned on without Verified Bots enabled. Cloudflare's Verified Bots list covers the major AI crawlers, but only when that setting is on; without it, the bots are challenged alongside scrapers.
- WAF rules from AWS, Akamai, or Imperva that block or challenge any non-browser user agent or any request without cookies. AI crawlers send a non-browser user agent and carry no session cookies, so they fail these rules by design.
- CMS security plugins (Wordfence and similar) shipping a "block known bots" list that includes AI crawlers by default.
- Geographic IP blocks that deny the cloud-provider ranges the crawlers run from.
- Single-page-app rendering with no server-side render, where the bot fetches the HTML and finds an empty container instead of your content.
How to check yours in a minute
Fetch your own homepage the way each crawler does and read the status code:
curl -sI -A "GPTBot/1.1 (+https://openai.com/gptbot)" https://your-domain.com/
curl -sI -A "Mozilla/5.0 (compatible; ClaudeBot/1.0; [email protected])" https://your-domain.com/
curl -sI -A "Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)" https://your-domain.com/
Each should start with a 200. A 403, 406, 429, or a redirect to a challenge means the edge is blocking the bot. For which crawler to allow per engine, and why the search crawlers (not the training ones) are the bots that matter for citation, see how to allow AI crawlers. Our AI crawler checklist walks through this and nine more checks, each testable in under a minute.
Common questions
Should I block AI crawlers to protect my content?
It is a real trade-off. Blocking GPTBot stops OpenAI from training on your content, but it also stops ChatGPT from citing you in answers. If buyers are asking AI engines about your category, blocking the crawler removes you from those answers. Most B2B brands that want AI visibility should allow the major crawlers and control exposure other ways.
Is robots.txt enough to control crawler access?
No. robots.txt is the contract well-behaved crawlers respect, but a CDN or WAF block happens before robots.txt is ever read. You can allow GPTBot in robots.txt and still serve it a 403 at the edge. Check both the file and the live response.
Do AI crawlers run JavaScript?
Generally no. Most fetch the HTML and parse the text. If your headline, product description, and pricing only appear after JavaScript hydrates, the crawler sees an empty shell. Server-render the content that matters, or pre-render it at build time.
Check whether the crawlers can read you
Run the curl checks above, work through the full checklist, or have us audit it end to end.
Try the Free Snapshot See the SXO Audit
By the Web Cited Editorial Research Team. Last updated 31 May 2026.