AEO 101Single source of truth on AEO

Do AI Crawlers Actually Read llms.txt?

Subia Peerzada

Subia Peerzada

Founder, Cite Solutions · June 7, 2026

Mostly no. In a 90-day audit of more than 500 million AI bot visits, only 408 requests fetched /llms.txt. The major AI search crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended) skip the file and read your HTML directly. The file is still worth publishing for a narrow reason, just not the one most teams assume.

That gap between how often llms.txt is published and how often it is read is the whole story. Adoption is climbing. Usage by AI search engines is close to zero. If you have been told the file is a citation lever, the crawler logs say otherwise.

Adoption is not usage

What AI crawlers actually do with llms.txt

Three independent 2026 datasets reach the same conclusion: sites publish the file, AI search crawlers almost never read it.

408

of more than 500,000,000 AI bot visits fetched /llms.txt

That bar is set to a visible 0.5%. The real share is 0.00008%.

Limy.AI

500M+ bot visits

Only 408 requests fetched /llms.txt across a 90-day window

SE Ranking

300,000 domains

10.13% publish llms.txt; no correlation with how often a domain is cited

Trakkr

37,894 domains

6.8 citations with the file vs 6.7 without (p=0.85, not significant)

Crawlers that skip the file and read your HTML directly

GPTBotClaudeBotPerplexityBotOAI-SearchBotGoogle-Extended
Sources: Limy.AI 90-day bot-log audit (2026); SE Ranking study of 300,000 domains (2026); Trakkr llms.txt-effect research, 37,894 domains (2026)

This post splits in two. First, the diagnosis: what three independent 2026 datasets found when they checked whether AI engines actually use llms.txt. Then the prescription: what to publish, what to skip, and where to spend the effort the file does not earn back.

What the crawler logs and citation studies actually found

Three separate research efforts in 2026 looked at llms.txt from different angles: server logs, domain-level adoption, and citation outcomes. They disagree on method and agree on the conclusion. AI search crawlers do not meaningfully read the file, and publishing it does not move citations.

Reason 1: Only 408 of 500 million AI bot visits touched the file

The sharpest number comes from a 90-day bot-log audit by Limy.AI. Across more than 500 million AI bot traffic events, filtering for the user agents that actually drive citations, only 408 requests targeted /llms.txt directly. That is roughly 0.00008% of AI bot activity pointed at the file the whole strategy depends on.

Publishing llms.txt and assuming AI reads it is a bet contradicted by the server logs.

Reason 2: The major crawlers read your HTML, not your map

The same audit names the bots that overwhelmingly skip llms.txt: GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and Google-Extended. These are the crawlers behind ChatGPT, Claude, Perplexity, and Google's AI surfaces. They crawl rendered pages, chunk them, and extract passages. They do not orient themselves through a hand-written directory file first.

This matters because it tells you where the work belongs. The page an engine quotes is your HTML, so that is the surface that has to be clean and extractable. We covered the mechanics in why passages beat pages.

Reason 3: Two large studies found no citation advantage

Adoption studies reach the same place from the outcome side. SE Ranking analyzed roughly 300,000 domains and found a 10.13% adoption rate with no relationship between having llms.txt and how often a domain is cited in AI answers. Both statistical analysis and machine learning showed no effect.

Trakkr scanned 37,894 cited domains and found sites with llms.txt averaged 6.8 citations against 6.7 for sites without it. A Mann-Whitney U test returned p=0.85. The difference is indistinguishable from noise.

Reason 4: The sites that adopt it are the ones chasing citations, not earning them

The Trakkr data has a tell. Among the 50 most-cited domains, only 6% had llms.txt. Adoption was higher among less-cited sites. The file is being installed by teams hoping to improve visibility, not by the sites that already own it. That is the signature of a tactic sold as a shortcut rather than a tactic that works.

The brands AI already cites mostly do not bother with llms.txt. That should tell you something.

Why the file exists if crawlers ignore it

llms.txt is not useless. It is misfiled. The file was designed for a different consumer than the AI search engines most marketers care about, and once you see who actually reads it, the right level of investment becomes obvious.

Here is the mismatch in plain terms.

What teams assume llms.txt does:

  • Tells ChatGPT and Perplexity which pages to cite
  • Acts like a ranking signal for AI search
  • Compensates for messy site structure

What llms.txt actually does:

  • Gives coding agents and IDEs a compact site map to fetch on demand
  • Provides a curated orientation layer, read mostly during agent workflows
  • Reinforces a hierarchy your HTML should already make clear

It is an agent and developer-tooling convention, not a search signal

The real traction for llms.txt is build-to-agent. IDE coding assistants like Cursor, Claude Code, and GitHub Copilot routinely fetch a project's llms.txt when a developer points them at documentation. That is a genuine use case. It is also not the consumer AI search surface where brand citations are won. The original spec at llmstxt.org leans into documentation and agent navigation, which is exactly where the file pays off.

Google audits the file while telling you it is not needed

Here is the contradiction worth knowing. Google's Chrome team added an llms.txt check to Lighthouse, so the file now shows up in a standard audit tool. Yet Google's own AI-optimization guide lists machine-readable files like llms.txt in its mythbusting section and states you do not need to create them to appear in generative search. One Google surface flags the file. Another tells you to ignore it. Both can be true: Lighthouse audits the agentic web broadly, while Search does not use the file for AI Overviews or AI Mode.

Adoption is rising even though usage is not

About 1 in 10 sites now publish llms.txt, 18 months after the idea appeared. Almost none of the AI search bots read it. Those are two different curves, and conflating them is how the file gets oversold. Publishing is cheap, so adoption grows on hope. Usage stays flat because the crawlers never changed their behavior.

Not sure which technical layers actually move your AI citations?

We audit what AI engines retrieve from your site, separate the tactics that work from the ones sold on hope, and fix the HTML and structure that crawlers actually read.

Book a Discovery Call

So should you publish llms.txt?

Yes, if it costs you an hour and you treat it as a developer-surface hedge. No, if you are publishing it to win AI search citations or as a substitute for fixing your pages. The honest framing is that llms.txt is a low-cost, low-return file. Keep your expectations matched to the data.

What to do instead of betting on llms.txt

The effort that goes into a perfect llms.txt is better spent on the surfaces AI crawlers actually read and the signals that actually predict citations. Work these in order.

Move 1: Make your HTML the clean version

Because crawlers read rendered HTML directly, that is where extraction happens. Ensure your important content exists in the server-rendered HTML, not only behind client-side JavaScript, and that headings, lists, and tables are real markup. This is the highest-return fix and the one llms.txt was wrongly sold as replacing. Our AI crawler log audit shows how to confirm what bots see.

Move 2: Lead every page with an extractable answer

Open each page and major section with a 40-to-60-word direct answer to the question the heading implies. Extraction favors answers that lead. A buried answer never enters the candidate set, no matter how good your llms.txt is. The structure research behind this is covered in does content structure affect AI citations.

Move 3: Publish llms.txt as a cheap dev hedge, not a citation play

If your buyers or their engineers use coding agents against your docs, ship a curated llms.txt for them. Keep it short, point to canonical pages, and update it when priorities change. Treat it as documentation hygiene. The full implementation pattern is in our companion guide on what llms.txt is and whether your site needs one.

Move 4: Spend the saved effort on off-site authority

Structure and clean HTML decide whether your trusted content gets quoted. Authority decides whether you are in the candidate pool at all. If your brand is absent from the third-party sources an engine reads for your category, no on-site file helps. The trust side is broken down in how AI platforms choose which sources to cite.

Move 5: Measure citations, not file adoption

Track whether your pages appear in AI answers, not whether you shipped a file. A green Lighthouse check for llms.txt tells you the file exists. It tells you nothing about whether ChatGPT cites you. Instrument the outcome that matters, using the basics in how to optimize for ChatGPT search.

File or surfaceWho reads itEffect on AI citations
llms.txtCoding agents and IDEs, rarely AI search botsNo measurable effect on citations
Rendered HTMLEvery AI search crawlerHigh. This is the surface that gets chunked and quoted
robots.txtCrawlers checking access rulesIndirect. Blocking a bot removes you from its answers
Off-site referencesThe engines building the candidate source poolHigh. Decides whether you are eligible to be cited

Optimize the surface the crawler reads. That is your HTML, not a directory file it skips.

FAQ

Do AI search engines like ChatGPT and Perplexity read llms.txt?

Almost never. A 90-day audit of more than 500 million AI bot visits found only 408 requests fetched llms.txt. The crawlers behind ChatGPT, Claude, and Perplexity (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot) skip the file and crawl your rendered HTML directly. The file is read mainly by coding agents, not consumer AI search.

Does having llms.txt improve my AI citation rate?

No, on the available evidence. SE Ranking found no correlation across 300,000 domains, and Trakkr measured 6.8 citations with the file versus 6.7 without across 37,894 domains, a difference that was not statistically significant. Citations are driven by clean HTML, extractable structure, and off-site authority, not by publishing llms.txt.

Then why does Google's Lighthouse audit llms.txt?

Google's Chrome team added the check as part of auditing the broader agentic web, where coding agents do fetch the file. Separately, Google's AI-optimization guide says you do not need llms.txt to appear in AI Overviews or AI Mode. The Lighthouse check is not a signal that Search uses the file for citations.

Is llms.txt worth creating at all?

It is worth an hour if your audience uses coding agents against your documentation, since IDEs like Cursor and Claude Code do fetch it. Keep it short and curated. Just do not treat it as a citation lever or a replacement for fixing the pages AI crawlers actually read.

What should I prioritize over llms.txt for AI visibility?

Make your important content exist in server-rendered HTML, lead every page with a direct 40-to-60-word answer, give data its own tables, and build off-site references in the sources your category's AI engines already read. Those moves act on the surfaces crawlers read and the signals that predict citations.

Bottom line

llms.txt is a real file with a real but narrow job: orienting coding agents that fetch it on demand. It is not a citation strategy, and the 2026 data is blunt about it. Only 408 of 500 million AI bot visits touched the file, two large studies found zero citation lift, and the brands AI already cites mostly skip it.

Publish a short one if your developers benefit. Then go back to the work that moves citations: clean HTML, a leading answer on every page, and authority in the sources engines actually read. The crawler reads your page, not your map.

Stop guessing which AI visibility tactics actually work.

We separate the data-backed levers from the oversold ones, audit what AI engines retrieve from your site, and fix the pages crawlers read so you get cited where it counts.

Book a Discovery Call

Ready to become the answer AI gives?

Book a 30-minute discovery call. We'll show you what AI says about your brand today. No pitch. Just data.