Do I need to allow GPTBot to appear in ChatGPT?

No. GPTBot is OpenAI's training crawler. The one that decides whether you appear in ChatGPT's search answers is OAI-SearchBot, and in OpenAI's words sites opted out of it will not be shown in ChatGPT search. They are independent settings, so allow OAI-SearchBot for visibility and treat GPTBot as a separate decision about whether your content is used in training.

Does blocking Google-Extended hurt my Google visibility?

No. Google-Extended is a robots.txt token that controls whether your content is used to train Gemini and related models. Google's documentation is explicit that blocking it has no effect on Search ranking, indexing, or inclusion in AI Overviews, because those use the regular Googlebot index. Only blocking Googlebot affects your search presence. So you can opt out of training and keep full Google visibility.

Will allowing the crawlers in robots.txt guarantee they can read me?

No. robots.txt is permission, not access. A CDN or web-application-firewall rule, a 403, a bot challenge, or rate limiting can block a crawler that robots.txt allows, and that is the more common real cause. The only reliable check is to request your own page with the crawler's user-agent and confirm it returns a 200, not a block page.

What is the difference between a training crawler and a search crawler?

A training crawler (GPTBot, ClaudeBot, or the Google-Extended token) collects content that may be used to train a model; allowing or blocking it does not change whether you get cited in answers. A search crawler (OAI-SearchBot, Claude-SearchBot, PerplexityBot, Googlebot) builds the index the engine cites from at answer time, so that is the one you must allow to be visible. A third type, the user-initiated fetcher like ChatGPT-User or Claude-User, visits a page when a person's request sends it there, and those generally ignore robots.txt because a user triggered them.

Is it safe to allow AI crawlers, or will they take my content?

It depends which crawler and what you object to. If your concern is your work being used to train models, you can block the training crawlers (GPTBot, ClaudeBot, Google-Extended) and still allow the search crawlers so you remain citable. If you want the traffic and citations AI answers can send, allowing the search crawlers is how you qualify for them. The user-initiated fetchers visit on a person's behalf and largely ignore robots.txt regardless, so blocking them is mostly symbolic.

Answers

How do I allow AI crawlers?

Allow each engine's search crawler, because those are the ones that decide whether you can be cited: OAI-SearchBot for ChatGPT, Claude-SearchBot for Claude, PerplexityBot for Perplexity, and Googlebot for Google's AI surfaces. The training crawlers - GPTBot, ClaudeBot, and the Google-Extended token - are a separate choice that does not change whether you get cited, so you can block training and keep citations. And the part most people miss: robots.txt is rarely the real blocker. A CDN or firewall rule turns away crawlers that robots.txt happily allows.

Three jobs, three kinds of crawler

Every major engine runs more than one crawler, and they do not do the same thing. Sorting them is the whole point, because allowing the wrong one gets you nothing and blocking the wrong one can cost you citations you wanted.

A training crawler collects content that may feed a model's training. Allowing or blocking it is a stance on whether your work is used to train AI; it does not decide whether you get cited in answers. A search crawler builds the index the engine reads from when it answers a live question, so this is the one that must be able to reach you for a citation to be possible. A user-initiated fetcher visits a specific page when a person's request sends the engine there, and because a user triggered it, these generally ignore robots.txt.

So "allow AI crawlers" really means one thing for visibility: make sure the search crawlers can read you. The rest is a separate decision.

Which crawler to allow, by engine

These are the search crawlers - the user-agents that earn citations - confirmed against each provider's own documentation:

OpenAI / ChatGPT: allow OAI-SearchBot. It is "used to surface websites in search results in ChatGPT's search features," and opted-out sites "will not be shown in ChatGPT search answers." GPTBot is training-only; ChatGPT-User is the user fetch.
Anthropic / Claude: allow Claude-SearchBot, which Anthropic describes as navigating the web "to improve search result quality" and "the relevance and accuracy of search responses." ClaudeBot is the training crawler; Claude-User is the user fetch.
Perplexity: allow PerplexityBot, "designed to surface and link websites in search results on Perplexity," which Perplexity says respects robots.txt. Perplexity-User is the live, user-triggered fetch and generally ignores robots.txt.
Google: allow Googlebot, the regular search crawler. Google's AI Overviews and AI Mode draw from the same index, so there is no separate "AI" crawler to allow. The Google-Extended token only governs training use of your content and has no effect on Search or AI Overviews.

Microsoft Copilot leans on Bing's index, so if you already allow Bingbot for Bing search, you are covered there too. For newer engines the same shape holds: find the search crawler and allow it; the training crawler is optional.

A robots.txt that allows the search crawlers

In robots.txt, anything not disallowed is already allowed, so most sites do not need to add anything - they need to make sure they are not blocking these bots. The usual culprit is a blanket rule like this, which turns everyone away, AI search crawlers included:

User-agent: *
Disallow: /

If you want to state intent explicitly, or you need to carve the search crawlers out of a broader block, name them and allow them:

# Allow the AI search crawlers - these decide whether you can be cited
User-agent: OAI-SearchBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Googlebot
Allow: /

If your goal is to allow citations while opting out of training, keep the block above and add a separate disallow for the training crawlers (GPTBot, ClaudeBot) plus a Google-Extended opt-out. The two choices do not interfere with each other.

robots.txt is permission, not access

You can allow every search crawler in robots.txt and still be unreachable, because robots.txt only states a policy - it does not open the door. A CDN bot rule, a web-application firewall, a 403 or 406, an aggressive rate limit, or a JavaScript challenge can all turn a crawler away after robots.txt has said yes. In practice that edge block is the more common reason an AI engine cannot read a site, and it is invisible if you only read your robots.txt file.

So verify access, do not assume it. Request your own page with each crawler's user-agent and confirm a 200 comes back rather than a block page. The full method and the rest of the technical floor are in AI crawler readiness, and our AI crawler checklist walks the live-response test step by step.

Allowing them is the floor, not the finish

Allowing the crawlers makes a citation possible; it does not make it happen. Once an engine can read you, it still only cites you if your page hands it a clean answer and trusted sources already name you. Being perfectly crawlable and still uncited is common - it is the whole finding of our Invisible 10 study, where ten funded vendors with readable sites drew zero citations across 600 responses on the four largest engines. Crawlability is the precondition; what turns it into citations is covered in AI search visibility, and the ChatGPT-specific version is in how to get cited by ChatGPT.

How Web Cited helps

Allowing the right crawlers is a five-minute fix once you know which ones matter and where the real block usually hides. Our AI crawler checklist covers the robots.txt and live-response checks, and the free 10-minute AI search audit shows you where you stand right now. To see whether the engines actually reach and cite you once access is open, the Free Snapshot gives a current read, and the SXO Audit runs 25 buyer prompts across six engines with three trials each over time so you can watch your citation share move.

Try the Free Snapshot See the SXO Audit

By the Web Cited Editorial Research Team. Last updated 1 June 2026.