Technical GEO8 min read

FAQ Schema Boosts AI Citations by 350%: What Otterly's 1 Million Citation Study Found

CS

Cite Solutions

Research · April 11, 2026

Otterly analyzed over 1 million AI citations across ChatGPT, Perplexity, and Google AI Overviews and found that FAQ schema markup produces a 350% citation increase versus unstructured content. That is the headline finding. But the study's more actionable discovery is that 73% of websites have crawlability issues that block AI systems from reading their content in the first place. If your pages fall in that 73%, no amount of schema optimization will change your citation rate. Crawlability is the foundation. Schema is what you build on top.

Citation rate by content format

FAQ schema produces the largest single-format citation boost

Based on Otterly's analysis of 1 million+ AI citations across ChatGPT, Perplexity, and Google AI Overviews (2026).

Raw Markdown (unrendered)0% — no AI visits

AI crawlers receive formatting characters instead of semantic HTML

Plain HTML, no schema1x baseline

Readable but AI must infer structure from prose

HTML + listicle format~2.5x

74.2% of cited content uses list or numbered structure

FAQ schema (JSON-LD)4.5x (+350%)

Highest single-format effect measured in Otterly's 1M citation study

73%

of sites

73% of websites in Otterly's study had crawlability issues that prevented AI systems from reading their content. Schema optimization has zero impact until this is fixed.

Community content

52.5%

of AI citations

Brand-owned content

47.5%

of AI citations

Sources: Otterly 1M+ Citation Report (2026); Conductor 2026 AEO/GEO Benchmarks Report

The Crawlability Problem Most GEO Strategies Skip

The 73% figure is the right starting point because it reframes the conversation.

Most GEO advice focuses on content format, keyword targeting, and citation strategies. It assumes AI systems can already access your pages. Otterly's citation study suggests that assumption is wrong for nearly three-quarters of websites.

Crawlability for AI is not identical to crawlability for Google. A page can rank on page one of Google while remaining invisible to AI citation systems. The specific issues Otterly's study surfaced fall into three categories.

JavaScript rendering gaps. Pages where core content only loads after JavaScript executes may appear blank to AI crawlers that do not run JavaScript. The page looks fine in a browser. The AI sees an empty HTML shell.

Robots.txt blocks. As AI platforms deployed their own crawlers (GPTBot, ClaudeBot, OAI-SearchBot, PerplexityBot), many websites inherited robots.txt configurations that block them. Some blocks were intentional during early debates about LLM training consent. Many were accidental, from over-broad wildcard rules that block any bot not on an explicit allowlist.

Content buried in the DOM. Pages that place key information behind accordions, tabs, or far down in the page structure have lower citation rates. Conductor's 2026 AEO/GEO Benchmarks Report found that 44.2% of AI citations come from content in the first 30% of a page. Information that appears late in the document gets cited far less.

The practical audit starts with view-source: not the browser-rendered version, but the raw HTML the server returns. Is your main content present in the static HTML? Do your robots.txt rules block known AI user agents? Does the most important information appear in the top third of the DOM?

Fixing these three issues before touching schema markup will have a larger impact for most sites than any content optimization.

For a complete walkthrough of how AI systems decide which sources to read and cite, see how AI platforms choose which sources to cite.

FAQ Schema and the 350% Citation Increase

Once your content is readable, format determines whether AI systems cite it.

Otterly's analysis of 1 million AI citations found that FAQ schema markup produces a 350% citation increase compared to pages without structured data. This is the largest single-format effect the study measured, and the mechanism is direct.

When an LLM generates a response to a user query, it looks for content that answers a discrete question in a short, self-contained passage that it can extract and attribute. FAQ schema provides this structure explicitly. Instead of requiring the AI to infer what is a question and what is an answer from prose, structured JSON-LD declares it:

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "How does FAQ schema affect AI citation rates?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "FAQ schema produces a 350% citation increase compared to unstructured content, based on Otterly's analysis of 1 million AI citations published in 2026."
      }
    }
  ]
}

The AI system can parse this without guesswork. The question, the answer, and the source are all labeled. That clarity is what drives the citation increase.

What Makes FAQ Schema Entries Actually Get Cited

Not every FAQ entry performs equally. From the patterns in citation research, high-citation FAQ entries share these characteristics.

The question matches language users type into AI search, not internal terminology your team uses. An entry asking "How does our proprietary citation methodology work?" will not appear for any real user query. An entry asking "What is FAQ schema?" or "Does structured data affect AI citations?" will.

Answers should run 40-80 words: specific enough to be useful, concise enough to extract as a passage. Answers that are one sentence are too thin. Answers that run five paragraphs are too dense.

Each answer should contain at least one verifiable data point or named source. AI systems are more likely to cite content with traceable claims over assertions without attribution.

For B2B SaaS companies, the highest-value pages to prioritize are comparison pages, product feature pages with common user questions, and pricing FAQ sections. These match the questions buyers ask AI systems when evaluating software.

The technical implementation of structured passages for AI extraction goes deeper than FAQ schema alone. FAQ schema is the structural layer. Passage quality determines whether the extracted content is specific enough to be useful.

Do you know how much of your site is actually readable by AI systems?

We run a full crawlability audit and FAQ schema assessment, then implement the changes that produce measurable citation improvement across ChatGPT, Perplexity, and Google AI Overviews.

Get Your AI Visibility Audit

Community vs. Brand Citations: A Closer Race Than Expected

One of the less-discussed findings from Otterly's study is how close community and brand content are in citation share.

Otterly 1M+ citation analysis

Where AI citations come from

Content source type
Share of citations
Key examples
Community content
52.5%
Reddit, LinkedIn posts, forums, review platforms, third-party comparisons
Brand-owned content
47.5%
Company blog, product pages, documentation, landing pages
Source: Otterly 1M+ Citation Report (2026)

The gap is 5 percentage points. For a study of 1 million citations, that is a meaningful but narrow difference. The intuitive assumption in most GEO programs is that owning your content domain gives you control over citation outcomes. The data says something more complex: AI systems draw from community sources because those sources reflect user experience and third-party perspective rather than brand claims.

The brands that appear most consistently in AI citations tend to do both simultaneously. Their own well-structured content gets cited directly. And the community discussions about them, on LinkedIn, in industry forums, in Reddit threads, in comparison articles on third-party sites, provide the surrounding context AI systems use when forming answers.

For the 89% of B2B SaaS brands with no systematic program for community presence, this is where brand authority building intersects with technical GEO. The crawlability and schema work gets you into the 47.5%. Community strategy gets you into the 52.5%.

LinkedIn alone accounts for roughly 11% of AI responses. It is the second most-cited domain across major AI platforms. A community strategy that excludes LinkedIn as a citation channel is incomplete.

Content Position and Format Beyond Schema

FAQ schema is the biggest single lever. The broader content format findings from Otterly and Conductor reinforce the same underlying pattern.

74.2% of cited content uses listicle format: lists, numbered steps, or bullet structures rather than prose paragraphs covering the same information. This number from Conductor's benchmark data is consistent with what makes intuitive sense about how AI responses are structured. AI systems tend to generate answers in lists and direct passages. Content already structured that way requires less transformation when it gets extracted for a response.

The position finding connects directly: 44.2% of AI citations come from content in the first 30% of an article. Pages that answer the core question early, before background sections, navigation elements, and supplementary content, get cited at higher rates than pages that make AI systems search for the answer.

These two findings interact with FAQ schema. A page that opens with a structured FAQ section, answers questions early in the DOM, and presents information in lists is hitting all three citation signals simultaneously.

This is not about gaming AI systems. It is about matching the format of your content to the format of AI responses. Content already structured the way AI answers are presented requires less work for AI systems to extract and cite.

For the complete framework on structuring content for passage extraction, see the complete guide to answer engine optimization.

HTML vs. Markdown: The Format That Gets Zero Citations

Otterly ran a direct comparison testing whether HTML or Markdown content delivery produces more AI citations. The results were unambiguous: HTML content received consistent citation traffic. Markdown-formatted content received zero AI search visits in the test period.

The specific issue is rendering, not syntax. When Markdown files are served as raw .md text, AI crawlers receive content decorated with asterisks, brackets, hash symbols, and pipe characters rather than semantic HTML. The result is content where structure is implied by formatting conventions rather than declared by tags. AI systems have a harder time extracting clean, attributable passages from raw Markdown because the content structure is ambiguous.

This matters most for B2B SaaS companies that host technical documentation, developer guides, or knowledge bases in Markdown format. If that documentation lives in GitHub repositories, raw GitHub pages, or static file hosts serving .md files directly, those pages may not receive citations even when the content is relevant and authoritative.

The fix is straightforward: ensure your content management system renders Markdown to HTML before serving it. Next.js, Gatsby, and most modern documentation platforms do this automatically. The issue tends to appear in edge cases: legacy documentation sites, direct file hosting, and developer tools built before AI citation was a consideration.

If you use an llms.txt file to signal content priorities to AI systems, pairing it with proper HTML rendering closes both the visibility and accessibility gaps at the same time.

The Sequence That Produces the Fastest Citation Improvement

Working from Otterly's findings, here is the order of operations that produces results most efficiently.

  1. Audit crawlability: 73% of sites have at least one issue that blocks AI systems from reading their content.
  2. Implement FAQ schema on pages with natural Q&A structure, including product pages, comparison pages, and pricing sections.
  3. Move key content earlier in the page so it appears before the 30% mark in the document.
  4. Convert comparison and how-to content to explicit listicle format.
  5. Ensure Markdown content renders as HTML before reaching any crawler.
  6. Build community presence on LinkedIn and platforms where buyers in your category are active.

Each step builds on the one before it. FAQ schema on a page AI systems cannot read produces no improvement. Correct listicle formatting on a page that is already getting cited is incremental. The sequence matters because it determines which problems get solved first.

For B2B brands that are not currently visible in AI search, the crawlability audit is almost always where the gap is. The content may be solid. The format may even be reasonable. But if the page is not readable, nothing else matters.

FAQ

What is FAQ schema and how does it affect AI citations?

FAQ schema is structured data markup in JSON-LD format that explicitly labels question-and-answer pairs on a page. AI systems can extract questions and answers directly from the schema rather than inferring structure from prose, which reduces ambiguity and increases the likelihood the content appears in AI responses. Otterly's analysis of 1 million AI citations found that FAQ schema produces a 350% citation increase versus unstructured content. No other single format change in the study produced a comparable effect.

What crawlability issues block AI citations most often?

The three most common issues in Otterly's study are JavaScript rendering gaps (core content loads only after JavaScript runs, invisible to AI crawlers that process static HTML), robots.txt blocks (over-broad rules accidentally blocking AI-specific crawlers like GPTBot, ClaudeBot, or PerplexityBot), and content buried late in the page DOM. Start a crawlability audit by viewing the raw HTML source of key pages, not the browser-rendered version. If your main content is missing from the raw HTML, that is the problem.

What is the split between community and brand content in AI citations?

Otterly's 1 million citation study found community content accounts for 52.5% of citations and brand-owned content accounts for 47.5%. The near-even split means neither channel dominates. Brands that appear most consistently in AI responses maintain a presence in both: well-structured owned content for the 47.5% and active community presence on platforms like LinkedIn, Reddit, and third-party review sites for the 52.5%.

Does Markdown content get cited by AI search engines?

Based on Otterly's HTML versus Markdown experiment, raw Markdown content received zero AI search visits while HTML content received citations consistently. AI crawlers receive raw .md files as text with formatting characters rather than semantic HTML, which makes it difficult to extract clean, attributable passages. Rendered MDX or HTML-converted Markdown does not have this problem. The issue is the rendering pipeline, not the authoring format.

How does content position on a page affect AI citation rates?

Conductor's 2026 AEO/GEO Benchmarks Report found that 44.2% of AI citations come from content in the first 30% of an article. Pages that answer the core question immediately, before background context and supplementary information, get cited at higher rates than pages that require the AI to search for the answer. The practical implication: place FAQ sections and direct answers near the top of the page, not at the bottom.

The Sequence Is Straightforward. Most Sites Are Not Following It.

The Otterly 1 million citation study produces cleaner guidance than most GEO research available today. Fix crawlability first because 73% of sites have issues that make optimization irrelevant until they are resolved. Implement FAQ schema second because no other single format change produces a comparable citation improvement. Content position and community presence fill out the picture.

The brands appearing consistently in AI citations are the ones that have solved the technical layer and built a presence in the channels AI systems draw from. Neither part alone is enough.

Most sites fail the crawlability test before getting to schema or format.

We audit your site's AI readability, implement FAQ schema across key pages, and build the citation program that puts your content in front of AI systems across ChatGPT, Perplexity, and Google AI Overviews.

Book a Discovery Call

Ready to become the answer AI gives?

Book a 30-minute discovery call. We'll show you what AI says about your brand today. No pitch. Just data.