Most people think “AI bot” is one thing. It is not. It is at least six.
OpenAI runs three separate crawlers. Anthropic runs three more. Each one does a different job, accepts different instructions, and produces a different outcome when blocked. A blanket robots.txt rule targeting “AI bots” will hit some of them and miss others. The ones it misses are often the ones that matter most for citations.
This is not a niche technical problem. It is one of the most common silent visibility errors in AI search right now: sites block the training crawler, misunderstand the search crawler, and never realize the two decisions are separate.
⸻
Key Takeaways
- OpenAI and Anthropic each run three separate bots: training, search indexing, and user-triggered retrieval
- Blocking a training bot has zero effect on search citation bots from the same company
- Both companies updated their official crawler documentation in late 2025 and early 2026 to make this explicit
- A single robots.txt entry like “User-agent: GPTBot Disallow: /” does not block ChatGPT search citations
⸻
What the Three Bots Actually Are
OpenAI documented its three-crawler structure in late 2024 and updated it again in December 2025. According to OpenAI’s official publisher documentation, the three bots are:
GPTBot crawls public web content to train OpenAI’s generative AI models. Blocking it means your content does not feed future training data. It does not affect whether ChatGPT cites you in search answers.
OAI-SearchBot crawls content to power ChatGPT’s live search feature. OpenAI states directly: sites opted out of OAI-SearchBot will not appear in ChatGPT search answers, though navigational links may still show up in some cases. This is the citation bot. Blocking it removes you from ChatGPT search results for substantive answers.
ChatGPT-User fetches a specific page when a user asks ChatGPT to read or retrieve it. This is not automated crawling. It is triggered by a real person asking a real question. Blocking it means ChatGPT cannot access your pages during user requests.
Three bots. Three separate robots.txt entries. Three different outcomes.
⸻
Anthropic Mirrored the Same Structure in February 2026
Anthropic updated its official crawler documentation on February 20, 2026. Search Engine Journal covered the update and confirmed the three-bot framework now mirrors OpenAI’s approach exactly.
ClaudeBot collects web content for Claude model training. Blocking it tells Anthropic to exclude your future content from training datasets.
Claude-SearchBot crawls content to improve the quality and relevance of Claude’s search results. Anthropic’s documentation states that blocking it “may reduce your site’s visibility and accuracy in user search results.”
Claude-User retrieves pages when a Claude user asks a question that requires accessing a webpage. Blocking it means Claude cannot fetch your content in response to user queries.
Same three-tier structure. Same independent controls. Same invisible consequences if you conflate them.
⸻
What Most robots.txt Files Currently Get Wrong
The most common mistake is blocking GPTBot without a separate entry for OAI-SearchBot. This became widespread in 2023 and 2024 when publishers wanted to opt out of AI training. The robots.txt looked reasonable. The problem is it only addressed training.
A Hostinger analysis of 66.7 billion bot requests across more than five million websites found that OpenAI’s search crawler coverage grew from 4.7% to over 55% of sites during the same period its training crawler coverage dropped from 84% to 12%. Sites were blocking training bots while search bots crawled freely. The reverse error also exists: blocking search bots while allowing training, meaning your content feeds model development but never gets cited in answers.
Among major publisher sites studied, blocking rates for the search bots remain high. OAI-SearchBot is blocked by roughly 49% and ChatGPT-User by roughly 40%, according to data from ALM Corp’s crawler analysis of publisher and news sites. Many of those blocks appear to be accidental, inherited from firewall rules or CDN configurations rather than deliberate decisions.
The robots.txt entry that blocks all three OpenAI bots looks like this:
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
Most sites that intended to block training have only the first two lines. Some have only the first.
⸻
The Configuration That Maximizes Citation Visibility
If the goal is to appear in AI search answers while keeping training data protected, the robots.txt needs to be explicit about each bot and each company. This is the framework most aligned with AI search visibility:
# Block training crawlers
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
# Allow search and retrieval bots
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: Claude-User
Allow: /
User-agent: PerplexityBot
Allow: /
This configuration tells training crawlers to stay out while leaving the search citation and user retrieval bots free to access your content.
One important note: robots.txt is a voluntary protocol. AI companies state in their documentation that they honor these directives, but there is no technical enforcement mechanism. OpenAI and Anthropic have both committed to respecting robots.txt in their published documentation. That commitment is meaningful, though it is not the same as a guarantee.
⸻
Where Your Hosting Stack Can Override robots.txt
robots.txt is not always the last word. CDN and firewall configurations can block bots before they ever reach your server or read your directives.
Cloudflare is the most common example. If Cloudflare’s AI Scraper blocking is enabled at the CDN level, it may be blocking OAI-SearchBot and Claude-SearchBot before your robots.txt is even read. The block operates at the edge. The bot receives a 403 error, never reaches your page, and never indexes your content for search citations. Your robots.txt could be perfectly configured and it would not matter.
This can apply to sites running behind Cloudflare, including some Kajabi sites, Squarespace custom domain setups, and WordPress installations using Cloudflare. It also applies to any hosting platform that bundles WAF or bot management by default.
Checking your CDN and firewall bot settings is a necessary step before assuming your robots.txt configuration is the only thing controlling crawler access.
⸻
Frequently Asked Questions
Q: Does blocking GPTBot stop ChatGPT from citing my site?
A: No. GPTBot handles training data collection. OAI-SearchBot handles ChatGPT’s live search citations. They are separate bots with separate robots.txt entries. Blocking GPTBot has no effect on whether OAI-SearchBot indexes your content for ChatGPT search answers. OpenAI confirms this explicitly in its publisher documentation.
Q: What is the difference between OAI-SearchBot and ChatGPT-User?
A: OAI-SearchBot crawls your site automatically to build and maintain an index for ChatGPT’s search feature. ChatGPT-User fetches a specific page when a real user asks ChatGPT to retrieve or read it. Both affect whether your content appears in ChatGPT responses, but through different mechanisms. Blocking OAI-SearchBot removes you from proactive search indexing. Blocking ChatGPT-User prevents live page retrieval during user conversations.
Q: How many bots does Anthropic run and what do they do?
A: Anthropic officially documents three bots. ClaudeBot collects content for model training. Claude-SearchBot indexes content to improve Claude’s search result quality. Claude-User retrieves pages when a Claude user asks a question that requires accessing a webpage. Anthropic updated this documentation in February 2026, separating what had previously been described as a single crawler into three distinct entries with separate robots.txt controls.
Q: What happens if I have a blanket “block all bots” rule in my robots.txt?
A: A wildcard disallow rule (“User-agent: * Disallow: /”) blocks all bots including search citation bots and user retrieval bots. Your content will not appear in ChatGPT search answers, Claude search results, or Perplexity citations. You also block Googlebot, which removes you from Google Search. Most sites with this configuration did not intend to block AI search citations specifically.
Q: Does Cloudflare affect which AI bots can access my site?
A: Yes. Cloudflare’s bot management and AI Scraper controls operate at the CDN level, before your server or robots.txt is reached. If Cloudflare is blocking AI crawlers via firewall rules, those bots receive a 403 error and never read your robots.txt at all. This can apply to sites running behind Cloudflare, including some Kajabi sites, Squarespace custom domain setups, and WordPress installations using Cloudflare. It is one of the more common sources of accidental AI search invisibility regardless of platform.
Q: Can I block training crawlers while still appearing in AI search results?
A: Yes. Training and search citation bots from the same company operate independently. You can block GPTBot (training) while allowing OAI-SearchBot (search) with separate robots.txt entries for each. The same applies to Anthropic: blocking ClaudeBot does not block Claude-SearchBot. This is the intended design, and both companies documented it explicitly.
⸻
Most people who set up their robots.txt in 2023 or 2024 made one decision: allow or block “AI bots.” That was a reasonable call at the time. It is no longer the right frame. The bots multiplied. The decisions did not.
⸻
AI Visibility Studio helps websites audit and configure their AI crawler settings so the right bots get in and the wrong ones stay out.
Originally published on Medium ↗