Google AI Overviews Are 91% Accurate. Their Sources Often Can't Prove It.

A study published on April 8, 2026, by AI startup Oumi in collaboration with The New York Times tested 4,326 Google searches using the SimpleQA benchmark. The researchers ran two rounds: one in October 2025 under Gemini 2, one in February 2026 after Google upgraded to Gemini 3.

The headline number is good news for Google: accuracy improved from 85% to 91%.

The number buried a paragraph later is not. Under Gemini 3, 56% of correct answers are "ungrounded": the cited sources don't actually contain information that supports the answer given. That figure was 37% under Gemini 2.

Accuracy went up six points. Ungrounded citations went up nineteen.

What "ungrounded" means in practice

When an AI Overview is ungrounded, it does not mean the answer is wrong. It means the AI got the right answer through its own reasoning, but the sources it shows users as "proof" do not actually prove it.

Click the citation link, and you find a page that is adjacent to the topic but does not contain the specific claim the AI stated. You cannot verify the answer using the sources provided because those sources never made that claim.

Futurism's coverage of the study included a striking detail: a journalist published a fake blog post about a fictional South Dakota International Hot Dog Eating Championship. Within one day, Google's AI Overview cited the fabricated post as fact. The AI did not verify the source's content. It treated the document as a reference and pulled it in.

This is what an ungrounded citation looks like at its most visible. The overwhelming majority of cases are subtler. A health AI Overview cites a general wellness page for a specific clinical statistic. A business AI Overview cites a press release for a quantitative claim the press release never made.

The Accuracy Paradox

Google AI Overviews: accuracy vs. source grounding across model versions

October 2025

Gemini 2

Factual accuracy85%

Correct but ungrounded37%

February 2026

Gemini 3

Current

Factual accuracy91%

Correct but ungrounded56%

Accuracy change

+6%

Gemini 3 is more accurate

Ungrounded answers

+19%

More correct but unsupported

Source: Oumi + The New York Times, SimpleQA benchmark, 4,326 queries (April 2026)

The scale problem

Google processes approximately 5 trillion searches per year. At a 9% error rate, that is tens of millions of wrong answers every hour, approaching 1 million per minute according to Technology.org's analysis of the Oumi study.

Google spokesperson Ned Adriance pushed back, calling the study "flawed." His objections have merit: SimpleQA is an OpenAI-built benchmark that focuses on particularly difficult questions where models already failed pre-screening, which means it is not representative of what most people actually search for. Google also uses its own internal evaluation benchmarks.

But even if the real-world error rate is half what Oumi found, the scale of AI Overviews means the absolute number of errors remains large. AI Overviews now appear on approximately 48% of all Google searches, up from around 16% previously. That is 58% year-over-year growth in coverage. The reach is now too large for quality problems to stay contained.

The ungrounding finding is separate from accuracy and harder to dispute. When correct answers cite sources that don't support them, the independent verification that citations are supposed to enable becomes impossible for everyday users. They see the citation link and assume it means the claim is verified.

What this data tells us about the Google AI Overviews system

The Gemini 3 citation shift we analyzed in April 2026 showed that sources per AI Overview increased from around 11.5 to 15 or more. Gemini 3 cites more sources per answer.

The Oumi study adds a quality dimension to that volume increase: more citations are being drawn from a wider pool, and that pool includes pages that are topically related to the answer but do not actually contain the specific evidence being cited.

What emerges is a system where the AI's reasoning has gotten sharper, but its source selection has not kept pace. The model knows the right answer. It pulls sources that seem relevant based on topic overlap. But it does not always verify that those sources contain the specific evidence for the claim at hand.

This is a retrieval problem, not a reasoning problem. And it has direct implications for how AI citations actually work and what makes content citation-ready.

The most cited social platforms in the study were Facebook (second) and Reddit (fourth). These are not known for rigorous sourcing. They are known for volume and topical coverage. The implication: Google's retrieval system is pulling in pages with broad thematic relevance, not specifically pages that contain the cited claim in a verifiable form.

Want to know if your content is actually supporting AI answers?

We audit citation quality across Google AI Overviews, ChatGPT, Perplexity, and Yahoo Scout. Most brands find their content is cited less often than it should be, and that some citations are working harder than others.

Get Your AI Visibility Audit

The screen real estate context

The accuracy and grounding problem lands differently when you consider what AI Overviews do to search result pages.

The average AI Overview now exceeds 1,200 pixels in height, up from around 1,050 pixels a year ago. In December 2025, the average peaked at 1,340 pixels. A standard desktop monitor viewport is approximately 900 pixels. That means AI Overviews have grown to exceed the entire visible screen on most monitors. Organic results are pushed entirely below the fold for any query that triggers an Overview.

On mobile, AI Overviews occupy roughly 48% of the screen.

The organic click-through consequences are significant. Seer Interactive found that queries with AI Overviews showed a 61% drop in organic CTR, from 1.76% to 0.61%. Ahrefs found that AI Overviews reduce click-through rates for the top organic position by 58%.

This makes the source-selection question directly commercial. If an AI Overview appears on nearly half of all searches, takes up the entire visible viewport, and drives a 60% CTR reduction for organic results, then appearing in the AI Overview matters more than it did when Overviews were narrow and rare. Getting cited as a source in an AI Overview is now one of the primary visibility positions in Google search for many queries.

That is the context in which 56% of correct answers cite unsupported sources. Many brands and publishers are appearing in AI Overviews not because their content specifically contains the claimed evidence, but because their pages are topically adjacent. That is a fragile position.

Why this is a GEO opportunity, not just a problem

Here is where generative engine optimization strategy connects to what the Oumi study found.

If 56% of AI Overview citations don't actually support the answers they appear in, then the bar for being a genuinely citation-ready source is not as high as it might seem. Much of the competition for citation positions is not content that rigorously supports AI claims. It is content that is nearby.

Being the source that actually contains and clearly states the specific evidence an AI system needs creates a real differentiation. A page that directly contains a named statistic, discloses methodology, and states conclusions in unambiguous terms is a better citation candidate than a page that discusses the same general topic without anchoring the claim.

This is what we call citation-ready content, and it connects to the passage-level structure principles that drive AI retrieval across platforms. AI systems extract specific passages, not whole pages. A passage that reads: "In a 14-day study by Otterly.ai, HTML pages received 4.64% and 2.76% of total AI crawler visits. Markdown pages received 0%." is a more citable unit than a paragraph that says AI systems prefer structured formats.

The difference is not sophistication. It is specificity. Named source, methodology, numbers, direct conclusion.

What citation-ready content looks like for AI Overviews

The Oumi study's ungrounding finding points to what AI Overviews are actually doing when they retrieve sources. They are matching on topic and proximity to the answer, not on whether the cited page explicitly contains the evidence.

A page that wants to win the citation and hold it has to do more than be topically relevant. It has to contain the claim in a form the system can verify.

Practically, this means:

Every factual claim should be attributed to a named, specific source. Not "research shows" or "industry experts say." Something like: "According to a 2026 Oumi analysis of 4,326 Google searches..."

Data should be stated precisely. Round numbers and vague ranges signal approximation. Specific figures signal authority.

Conclusions should match the evidence shown. If your page cites a 3,000-person study to support a claim, the claim should be exactly what that study found, not a broader generalization.

Headings should directly match the sub-queries the AI is answering. Citation drift data from Scrunch and Stacker shows that AI citation positions are not stable week to week. Content that fits a retrieval sub-query precisely is better positioned to hold its citation than content that fits the query vaguely.

The AI Overviews accuracy paradox points to a gap in the current ecosystem. There is more citation-ready content than most teams realize, but much of it is not earning citations because it lacks the specific supporting evidence that would make it genuinely grounded. Getting into that top tier of truly citation-ready content is not a massive content overhaul. It is precision applied to what you already have.

Ready to turn your content into citation-ready assets?

Cite Solutions audits your AI visibility, identifies which pages are citation-adjacent versus citation-ready, and engineers the content changes that close the gap. Most clients see measurable movement within 60 days.

Book a Discovery Call

FAQ

What did the Oumi and New York Times AI Overviews study find?

The joint study tested 4,326 Google searches using the SimpleQA benchmark, comparing AI Overviews under Gemini 2 (October 2025) and Gemini 3 (February 2026). Factual accuracy improved from 85% to 91%. But the percentage of correct answers citing unsupported sources rose from 37% to 56%. The study was published April 8, 2026, and covered by Search Engine Land, Futurism, The Decoder, and others.

What does "ungrounded" mean in the context of AI Overviews?

An ungrounded answer is one where the AI Overview gives a factually correct response but cites sources that don't actually contain evidence supporting that answer. Users who click the citation link will not be able to verify the claim because the cited page never made it. The AI reached the right conclusion independently but selected sources based on topical proximity rather than specific evidence.

How often does Google AI show wrong information?

The Oumi study found a 9% error rate across 4,326 queries under Gemini 3. Google disputes the study's methodology, noting that SimpleQA focuses on deliberately difficult questions. Real-world error rates for typical queries may be lower. At 5 trillion annual searches, even a 9% error rate means tens of millions of wrong answers per hour at peak volume.

How does the AI Overviews accuracy problem affect SEO and content strategy?

AI Overviews now appear on approximately 48% of Google searches and push organic results below the visible viewport on most screens. Seer Interactive found a 61% CTR drop for organic results on AI Overview queries. Because many current citations are topically adjacent rather than specifically supportive, content that directly contains the evidence AI systems need has a structural advantage over content that is merely nearby the topic.

How can my brand get cited more reliably in Google AI Overviews?

Content that earns and holds AI Overview citations tends to attribute claims to named, specific sources; present data with precision rather than approximations; and structure headings as direct answers to the sub-queries AI systems generate. Passage-level structure matters more than overall page authority under Gemini 3, because the model now draws from 15 or more sources per answer and matches passages to specific retrieval needs.

What the grounding problem means for serious publishers

The Oumi study did not set out to embarrass Google. It set out to measure a system at scale. What it found is that a search product used by billions of people is returning accurate answers with unreliable source attribution at a rate that has grown significantly in four months.

For brands and publishers who care about being cited accurately, this is the clearest argument for making sure your content is not just nearby the topic but actually contains the evidence. Topical adjacency gets you into the citation pool. Specific, grounded, directly supportive content is what makes the citation mean something and hold up over time.

The accuracy is improving. The sourcing infrastructure has not kept pace. That gap is exactly where citation-ready content finds its edge.