AEO 101Single source of truth on AEO
Technical Guides10 min read

Is ChatGPT-User Allowed in Your Robots.txt?

Subia Peerzada

Subia Peerzada

Founder, Cite Solutions · May 16, 2026

A B2B SaaS site can rank on page one of Google, publish a clean answer block, and still get zero ChatGPT citations for a category query. We see it weekly. Nine times out of ten the cause is one line in robots.txt.

The site is letting OAI-SearchBot through and blocking ChatGPT-User. Or it is blocking both without realising they do different jobs. Or it is allowing one of them and rate-limiting the other into oblivion at the CDN layer.

The two crawlers look almost identical from the outside. They are not. One builds an index. The other is the bot that actually fetches your page when a buyer asks ChatGPT a live question. If the second one cannot reach you, you do not appear in the answer.

Olivier de Segonzac at RESONEO documented the distinction in Search Engine Land on May 14, 2026, reverse-engineering ChatGPT Search's retrieval flow. The finding has been sitting in OpenAI's own docs for over a year, but most B2B AI visibility audits we run still miss it.

This post walks through what the two crawlers do, the five ways teams get the robots.txt wrong, and a seven-step fix you can ship today.

The three OpenAI crawlers most teams confuse

OpenAI runs three named crawlers. Each one does a different job. Most robots.txt files we audit treat them as one bot. They are not.

OAI-SearchBot builds the ChatGPT Search index

OAI-SearchBot is the offline indexer. It crawls the open web on a schedule, builds the corpus that ChatGPT Search uses to surface candidate pages, and tags pages for retrieval-time selection. It is the bot you allow if you want ChatGPT to know your page exists.

Per OpenAI's overview of bots and crawlers, OAI-SearchBot does not fetch content for live conversations. It builds the corpus that the live conversation reads against.

ChatGPT-User fetches the page during a live conversation

ChatGPT-User is the runtime fetcher. When a buyer asks ChatGPT a question and the model decides to browse the web, ChatGPT-User is the user agent that issues the actual HTTP request to your URL. It pulls the page, parses it, extracts the passages, and feeds them to the model for citation.

If OAI-SearchBot is allowed and ChatGPT-User is blocked, ChatGPT knows your page exists but cannot read it during a live answer. You become invisible at the moment of citation.

GPTBot trains the next model

GPTBot is the training-data crawler. It pulls content for the next ChatGPT model checkpoint. Blocking GPTBot does not affect ChatGPT Search citations today. Allowing it influences whether your brand gets mentioned by ChatGPT for queries that do not trigger live browsing, which is most of them.

Three crawlers. Three jobs. One robots.txt file. If you only know one rule, you only control one third of your ChatGPT visibility.

Why this distinction decides whether ChatGPT cites you

There are five reasons teams keep getting this wrong. Each one is fixable in under an hour once you see it.

Reason #1: Most security stacks block any user agent with "bot" in the name

If your site sits behind Cloudflare, Akamai, Fastly, or AWS WAF with default bot-management rules turned on, both OAI-SearchBot and GPTBot are likely blocked by category, not by name. ChatGPT-User often slips through because it does not have "bot" in the user-agent string.

The result is a mismatch. The runtime fetcher reaches you. The indexer cannot. ChatGPT cannot list your page as a candidate, so the runtime fetcher never gets asked to read you. The citation never happens.

Reason #2: Most robots.txt files were written before ChatGPT Search existed

We audit a lot of B2B SaaS sites with a 2022-era robots.txt. The file allows Googlebot, blocks scrapers, and never mentions any OpenAI user agent. In the absence of an explicit allow rule, the default is "allow." That sounds safe.

It is not, because the CDN is running its own bot rules at the edge. The robots.txt says one thing. The Cloudflare AI Audit ruleset says another. The edge wins. Your team usually has no idea which one is in effect.

Reason #3: The two crawlers fetch at different rates and times

OAI-SearchBot crawls in slow, predictable cycles. ChatGPT-User fetches on demand, often in bursts when a popular query lands. CDN rate-limit rules tuned for normal crawler patterns will flag ChatGPT-User traffic as anomalous and start returning 429s.

A 429 to ChatGPT-User is a missed citation. The model does not retry politely. It moves to the next candidate page on the list, which is your competitor.

Reason #4: A single Disallow line on a key path silently kills citations

We have seen a Disallow on /blog/*?utm_* strip out 60 percent of ChatGPT citation candidates because the runtime fetcher follows the canonical URL with the tracking parameters still attached. The robots.txt was right in spirit and wrong in fetch reality.

Same pattern with Disallow on /docs/, /help/, and /learn/. Those are exactly the paths ChatGPT prefers for buyer-stage queries. If they are walled off, the citation goes somewhere else.

Reason #5: Sites mirror Cloudflare's "block all AI" default and then forget

Cloudflare introduced a one-click "block AI crawlers" feature in mid-2024 and a renewed AI Audit toolkit in 2025. A site admin who clicked it in a panic during the training-data backlash is, eighteen months later, still blocking ChatGPT-User. They have no idea.

The fix is two clicks in the Cloudflare dashboard. The damage is months of zero ChatGPT citations while the team wonders why GEO is not working.

If your team has not audited the AI crawler rules at the CDN layer in the last 90 days, your ChatGPT citation rate is probably lower than it should be.

We run AI crawler server log audits for B2B SaaS portfolios. ChatGPT-User, OAI-SearchBot, GPTBot, ClaudeBot, PerplexityBot, and Google-Extended tracked against robots.txt, CDN rules, and citation outcomes. Two-week ramp.

Book a Discovery Call

How to fix your robots.txt in seven steps

The fix is short. The discipline is making sure every layer agrees with every other layer.

Step 1: Pull your current robots.txt and grep for every OpenAI user agent

Open https://yourdomain.com/robots.txt in a browser. Search for OAI, ChatGPT, GPTBot, and OpenAI. If you find none of them, your file is in the 2022 era. If you find some but not all, your file is partially current. Note which agents are explicitly allowed, explicitly disallowed, or unmentioned.

Step 2: Add an explicit Allow block for ChatGPT-User and OAI-SearchBot

Replace silence with an explicit allow. The simplest correct block looks like this:

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: GPTBot
Allow: /

If you want to exclude private areas, add a Disallow line under each agent for the specific paths. Do not blanket-disallow. Disallow only the routes you would not want a buyer to see anyway.

Step 3: Check your CDN bot management rules in the same session

Log in to Cloudflare, Akamai, Fastly, or AWS WAF. Find the bot management or WAF section. Look for any rule that blocks "AI bots" or "scrapers" by category. If one is on, identify which user agents it covers. Cloudflare names them explicitly in the AI Audit section.

Add ChatGPT-User, OAI-SearchBot, and GPTBot to the allowlist. Re-test the rule with a curl request impersonating each user agent and confirm a 200 response.

Step 4: Test each crawler with a curl impersonation

Run these from your terminal:

curl -A "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)" -I https://yourdomain.com/your-key-page

curl -A "Mozilla/5.0 (compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)" -I https://yourdomain.com/your-key-page

curl -A "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)" -I https://yourdomain.com/your-key-page

Each should return a 200. A 403 means the CDN or origin is still blocking. A 429 means rate limits are too aggressive. A 200 means the path is clear.

Step 5: Pull server logs and confirm real ChatGPT-User hits

Filter your last 30 days of server logs for ChatGPT-User. If you see hits, you are being fetched live. If you see zero hits and your category has any meaningful ChatGPT query volume, your site is probably blocked upstream. Most B2B SaaS sites we audit show a 50 to 200x ratio of OAI-SearchBot hits to ChatGPT-User hits, with the live fetcher significantly under-represented when there is a configuration problem.

Step 6: Raise rate-limit thresholds for verified AI user agents

Most rate-limit rules treat AI crawlers like adversarial scrapers. They are not. They are citation pipelines. Raise the per-IP and per-user-agent thresholds for ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, and cohere-ai to at least 60 requests per minute and verify against your own log baseline.

Step 7: Repeat the same audit for Claude, Perplexity, and Gemini

Anthropic runs ClaudeBot for indexing and Claude-User for live retrieval. Perplexity runs PerplexityBot for indexing and Perplexity-User for live fetch. Google runs Google-Extended for Gemini-related opt-out signals, separate from Googlebot.

The same robots.txt logic applies. Indexer plus live fetcher equals citation. If you only allow one, you only get half the citation surface.

What a complete AI crawler robots.txt looks like

For a B2B SaaS site that wants maximum AI citation surface across the four major engines, the working baseline looks like this:

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: cohere-ai
Allow: /

User-agent: Applebot-Extended
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Add a sitemap reference. Skip wildcard Disallow lines. Add path-specific Disallow only for legitimately private paths like /admin/, /api/internal/, or staging hosts.

The audit in practice

Two recent examples from our own work, both B2B SaaS, both in regulated categories.

ItemSite A (cyber-security SaaS)Site B (HR SaaS)
Pre-audit robots.txtAllowed GPTBot, no mention of ChatGPT-UserDisallowed all unknown user agents
Pre-audit CDN rulesCloudflare AI Audit on, blocking ChatGPT-UserAkamai bot manager flagged ChatGPT-User as anomalous
Pre-audit ChatGPT citations on 20 buyer queries1 of 200 of 20
FixAdded Allow blocks, switched off AI Audit block, raised rate-limit thresholdExplicit Allow for all named AI agents, allowlisted at Akamai
ChatGPT citations 30 days post-fix8 of 206 of 20

Both sites had the answer-block content, the schema, and the internal linking already in place. The robots.txt and CDN layer was the single blocker. Once fixed, retrieval started in 18 to 22 days, citation share lifted in 25 to 35 days.

We have written the broader workflow in how to run a GEO crawlability audit and in how to run an AI visibility audit. The crawler-rules layer is the cheapest, fastest fix in either workflow. Most teams skip it because the file feels boring.

Allowing the indexer without allowing the live fetcher is the most common silent AI citation killer in B2B SaaS.

What to check this week

Three actions to take in the next five working days:

  • Pull robots.txt and confirm ChatGPT-User, OAI-SearchBot, GPTBot are each explicitly allowed. Do the same for the Anthropic, Perplexity, and Google-Extended pairs.
  • Log in to your CDN, find any "block AI bots" or "AI Audit" rule, and either turn it off or allowlist the runtime fetchers.
  • Pull 30 days of server logs, count hits per AI user agent, and confirm ChatGPT-User, Claude-User, and Perplexity-User are showing real traffic.

Pair this with how to optimize for ChatGPT search and why your brand might not be showing in ChatGPT for the content-side fix. Pair it with HTML parity for AI retrieval if your site renders content client-side. The point is to make sure every layer that touches the request, from edge to origin to robots.txt to schema, agrees on the same answer.

FAQ

What is the difference between ChatGPT-User and OAI-SearchBot?

OAI-SearchBot is OpenAI's offline indexer. It builds the ChatGPT Search corpus on a schedule. ChatGPT-User is the live runtime fetcher that issues the actual HTTP request when ChatGPT decides to browse the web mid-conversation. Both need to reach your site for ChatGPT to cite you.

Will blocking GPTBot stop ChatGPT from citing my site?

No. GPTBot is the training-data crawler. Blocking it stops OpenAI from training future models on your content but does not stop ChatGPT Search from citing your live pages. The crawlers that affect live citations are OAI-SearchBot and ChatGPT-User.

How do I know if Cloudflare is blocking AI crawlers on my site?

Log in to the Cloudflare dashboard, open the Security section, and look for the AI Audit or Bot Fight Mode toggles. If either is on with default settings, AI crawlers are likely being challenged or blocked. Switch to allowlist mode for verified AI user agents and re-test with a curl impersonation.

Should I block AI crawlers if I have not built a GEO strategy yet?

Default to allowing the runtime fetchers (ChatGPT-User, Claude-User, Perplexity-User) and the indexers (OAI-SearchBot, ClaudeBot, PerplexityBot). You can decide later whether to allow training crawlers like GPTBot. Blocking runtime fetchers eliminates citation potential entirely.

How often should I re-audit my AI crawler robots.txt?

Quarterly at a minimum. New AI user agents launch every few months (Perplexity added Perplexity-User mid-2024, Anthropic split ClaudeBot and Claude-User later that year). A robots.txt that was complete six months ago is probably stale today.

The bottom line

Most B2B SaaS sites that complain about low ChatGPT citation share are not failing on content. They are failing on the single line that decides whether the live fetcher can read the page. Fix the crawler layer first. Then keep optimizing the content. The order matters because content cannot rescue a 403.

Most ChatGPT citation problems are robots.txt problems. Most teams find out 90 days too late.

Cite Solutions runs an AI Crawler Server Log Audit as the first deliverable of every GEO engagement. ChatGPT-User, Claude-User, Perplexity-User traffic measured against robots.txt and CDN rules, with a fix plan inside two weeks.

Book a Discovery Call

Ready to become the answer AI gives?

Book a 30-minute discovery call. We'll show you what AI says about your brand today. No pitch. Just data.