Why Documentation Structure Fits Agent Workloads

Two stories about the same site

A documentation site is usually introduced as a place humans go to read.

You land on a page. You skim the titles. You follow some links. You find an answer. Then you close the tab and try something in your terminal.

The site is a collection of pages, and the pages are for people.

An AI agent sees the same site differently.

The agent has a question. It needs context. It has to decide which sources are worth fetching, which page is canonical, which version should be cited, and whether the answer can be grounded in a source that actually supports the claim.

That process is usually called retrieval.

The name sounds technical, but the idea is plain. Before the agent can produce a grounded answer, it has to select sources. If the answer is good, those sources are not random pages. They are the right pages, at the right canonical URLs, with enough surrounding metadata to keep the citation stable.

So there are two stories about the same site.

The reader story says:

a human opens the overview,
follows the deployment guide,
reads the account abstraction page,
and tries the command.

The agent story says:

a crawler finds a manifest,
resolves a canonical URL,
indexes a clean page,
and later surfaces that page as a citation.

Agent docsHumans browse pages. Agents need a small number of canonical entry points that make the same site legible before retrieval.

The content can be identical in both stories.

The access pattern is not.

That is the core idea of this essay:

The short version

Documentation becomes easier for agents when its boundaries are explicit: what to start with, what to crawl deeply, what to cite, and what not to confuse.

This matters more in April 2026 than it did two years ago.

Developers are not only typing questions into search engines. They are asking ChatGPT, Claude, Perplexity, Cursor, Codex, Claude Code, and other tools to explain APIs, generate code, fix errors, and choose canonical docs.

If the official documentation is not structurally legible, those tools still answer.

They just answer from somewhere else.

That is the failure mode.

Not because the docs are bad.

Because the docs were not the easiest authoritative source to retrieve at the right moment.

Start with the wrong picture

The wrong picture is this:

Add llms.txt and agents will find everything.

That picture is too blunt.

A curated llms.txt is useful. It gives agents and crawlers a short, human-maintained map of the pages that matter most. The llms.txt proposal describes a Markdown file with an H1, short context, and organized links. Documentation platforms such as Mintlify now support llms.txt and llms-full.txt directly.

But llms.txt is not access control.

It is not a guarantee of ranking.

It is not a substitute for readable pages.

It is not a substitute for robots.txt, sitemap coverage, canonical URLs, page titles, descriptions, structured data, or a CDN that lets legitimate crawlers reach the content.

The better picture is layered.

At the top, a curated manifest tells agents what matters first.

Below that, an exhaustive context file gives deep sessions a way to load the site more completely.

Below that, sitemaps and markdown routes expose broad coverage.

Below that, each page carries title, description, canonical URL, Open Graph data, and structured data.

Below that, the page body has to actually answer the question.

Around the whole thing, robots.txt and WAF/CDN policy decide whether crawlers can reach the surface at all.

So the question is not:

Do we have llms.txt?

The better question is:

Can an agent move from a query to a canonical cited source without hitting a broken boundary?

That is the production question.

What retrieval actually means

Let us start with the most important word in this essay:

retrieval.

In this setting, retrieval means the query-by-query process by which an agent selects sources to build an answer.

Suppose a developer asks:

How do I deploy a Cairo contract on Starknet?

The agent now has to decide what to cite.

It might use a search index. It might use a built-in web tool. It might use a local docs index. It might use a hosted documentation manifest. The implementation changes from product to product.

The shape is similar anyway:

receive the query,
search or fetch candidate sources,
prefer canonical pages,
load enough context,
answer with citations.

If that process works, the answer is grounded.

If it fails, the answer may still sound grounded.

That is why documentation structure matters.

The agent needs to discover the source before it can cite it.

A useful mental model is a row.

One row takes a query and resolves it to one source.

Then the next row starts with a new or refined query and resolves to one more source.

Across many users, those rows become a citation pattern.

Agent docsThe useful unit is not just a page. It is a query resolved to a canonical source with enough metadata to cite safely.

This is why a curated manifest helps.

It does not replace search.

It gives search and retrieval a clean starting surface.

A plain alphabetical list of every page is better than nothing. But it does not say what matters first. It does not tell a coding agent which pages are canonical for onboarding, deployment, account abstraction, fee mechanics, node operation, or security.

A curated manifest does.

It says:

start here,
these are the primary concepts,
these are the build paths,
these are the protocol references,
and this is where to go when you need exhaustive context.

For a human, that looks like a table of contents.

For an agent, it is a routing layer.

What canonical state means here

When people hear “canonical,” they often think of SEO.

That is partly right, but too narrow.

In this essay, canonical state means:

the information that must be present at a page boundary so the next retrieval is well-defined.

A page boundary is where one unit of retrieval begins and ends.

At that boundary, an agent should be able to answer basic questions:

What is this page about?
Is this page indexable?
Is this the canonical URL?
Is there a concise description?
Is there structured data describing the page type?
Are there stable internal links to related pages?
Does the content answer the query it appears to answer?

If those questions are easy to answer, the page is legible.

If they are hard to answer, the agent has to infer too much.

That is where errors enter.

A page can be beautifully written and still be weak as a retrieval target if it has a missing title, conflicting canonical URL, accidental noindex, thin description, broken metadata, or a WAF challenge page served to crawlers.

Those are not writing problems.

They are boundary problems.

Boundary problems are exactly the kind that should be checked automatically.

The manifest is the small bridge

A manifest is a compact statement of what matters.

For agent-facing documentation, the useful manifest is not an exhaustive dump. It is the small bridge from intent to source.

That is why llms.txt should be curated.

A good llms.txt is not trying to list every page. The exhaustive layer is llms-full.txt, sitemap, and markdown routes.

The curated layer should be smaller and opinionated.

It should say:

this is the official documentation,
these are the first pages to read,
these are the primary developer paths,
these are the protocol references,
these are the security and operations pages,
and this is where to load more context.

For docs.starknet.io, PR #1751 made that exact move. It added a curated llms.txt with validation CI, while leaving the full generated context available through llms-full.txt.

That split is important.

The curated file is for orientation.

The full file is for depth.

The sitemap is for coverage.

The pages are for truth.

If those roles get mixed up, the system becomes noisy.

If each role stays narrow, the system becomes easier to reason about.

The April 2026 stack

The practical stack in April 2026 is not exotic.

It is mostly boring web hygiene made explicit for agents.

The first layer is crawl access.

robots.txt should be reachable, valid, and not accidentally blocking the surfaces you want cited. OpenAI documents separate crawler identities for OAI-SearchBot, GPTBot, and ChatGPT-User. Anthropic documents separate Claude-related bots, including ClaudeBot, Claude-SearchBot, and Claude-User. Perplexity documents PerplexityBot. Google documents crawler tokens such as Googlebot and Google-Extended.

The important detail is that these bots do not all mean the same thing.

Some are for search indexing.

Some are for training opt-out.

Some are user-triggered fetchers.

So the correct move is not blindly “allow all AI bots” or blindly “block all AI bots.” The correct move is to decide which surfaces should be discoverable, then verify that robots.txt and the WAF/CDN behave that way for the relevant crawler classes.

The second layer is the curated manifest.

That is llms.txt.

It should be short enough to be useful and structured enough to be parsed: one H1, a short summary, clear H2 sections, links with descriptions, no random query URLs, no duplicate canonicals, and no generated noise in the top layer.

The third layer is exhaustive context.

That is llms-full.txt, markdown page routes, and sitemap coverage. Mintlify’s docs describe this split clearly: llms.txt is the structured map, while llms-full.txt combines broader site content for LLM context.

The fourth layer is page metadata.

Each important page should have a title, meta description, canonical URL, indexable robots directive, Open Graph and Twitter metadata, and structured data where appropriate. Google’s structured data docs are still the safest source for how Google Search consumes structured data, and they explicitly recommend relying on Search Central docs for Google behavior.

The fifth layer is content density.

Not SEO keyword stuffing.

Actual answer density.

Definitions, examples, commands, constraints, current version notes, and links to canonical references.

Agents cite pages that answer questions.

A thin marketing page with beautiful design can be a weak retrieval target if it does not contain the factual anchors a developer asks for.

Agent docsllms.txt is only one layer. The durable stack combines curated manifests, full context, robots policy, canonical metadata, and monitoring.

The stack is useful because each layer has a job.

If the crawler cannot reach the page, metadata does not matter.

If the page has no canonical URL, citations drift.

If the manifest is generic, agents start from noisy routes.

If the page body does not answer the question, even perfect metadata cannot save it.

The audit is where this becomes real

The most useful part of this work is not the theory.

It is the audit loop.

A good audit asks boring questions repeatedly:

Does robots.txt return 200?
Does llms.txt return 200?
Does llms.txt have exactly one H1?
Are the links canonical?
Does llms-full.txt exist?
Does the sitemap exist?
Do key pages return 200?
Are key pages indexable?
Do key pages have canonical URLs?
Do key pages have titles and descriptions?
Do crawler user-agent tests reveal WAF risk?
Does /ai exist if humans are using that URL in conversation?

This is not expensive.

It should not slow down every docs contributor.

The right pattern is a lightweight scheduled or manual health check, plus targeted CI only when the discovery files change.

That was the reason for separating the checks in the Starknet docs work.

The llms.txt validator runs when the manifest changes.

The metadata health check can run weekly or manually.

A normal contributor fixing a typo in a docs page should not pay for a full internet audit on every commit.

That is production-grade because it respects maintainers.

It catches drift without making every small docs change feel heavy.

Agent docsMost production failures are not philosophical. They are boring boundary bugs: 404s, noindex, blocked bots, missing canonical URLs, or generic generated manifests.

The audit should also be honest about what it proves.

A spoofed user-agent request from your laptop is not proof that Googlebot or ClaudeBot is blocked in production.

It is a signal.

The real answer lives in CDN and WAF logs, verified bot settings, and search console tooling.

That distinction matters.

Good agent discoverability work should reduce confusion, not create more of it.

The Starknet case study

The Starknet docs work was a clean first step.

docs.starknet.io already had useful generated surfaces through Mintlify. The problem was not that the generated llms.txt concept was wrong.

The problem was that the top-level manifest was too generic for agent retrieval.

So PR #1751 added a curated root llms.txt and validation around it.

That gives coding agents a better first map of the docs.

Then the second step was a metadata health check for representative pages.

That check is intentionally narrow. It does not grade writing quality. It does not claim to measure AI ranking. It checks the boring structural pieces:

status code,
content type,
title,
canonical URL,
robots indexing signal,
Open Graph title,
Twitter title,
JSON-LD presence.

That is the right level for CI.

You do not want a workflow that tries to decide whether a page is semantically great.

You want a workflow that catches obvious machine-readability regressions before they ship.

The third step moved outside the docs repo.

The main starknet.io site is a different surface. It is WordPress and Yoast, not Mintlify docs. So the fixes are different.

The audit found a good baseline: important pages return 200, are indexable, and carry useful metadata.

It also found three practical gaps:

starknet.io/ai returned 404 while the real AI page was /verifiable-ai-agents/,
starknet.io/llms.txt existed but was generic and Yoast-generated,
and crawler user-agent checks suggested the WAF/CDN should be verified for legitimate search and AI crawlers.

Those are not reasons to panic.

They are exactly the kind of gaps an audit is supposed to reveal.

The fix is not complicated:

make /ai resolve intentionally,
override or customize the generated llms.txt,
link the main site, docs site, llms-full.txt, and StarkSkills together,
and verify crawler behavior in the web stack.

That is the real story.

Not “we added one file.”

The real story is:

we are turning a collection of web pages into an explicit agent-readable source graph.

What I would do now

If I were setting this up for a serious developer ecosystem in April 2026, I would use this order.

First, fix the docs root.

Add or curate llms.txt. Keep llms-full.txt. Validate the curated file in CI. Make sure every link resolves to an intentional canonical page.

Second, add a metadata health check.

Do not run it on every docs edit. Run it weekly, manually, and when the check changes. Keep it representative and cheap.

Third, audit the main site.

This is where many teams miss the boundary. The docs can be perfect while the main site has a broken AI landing URL, generic generated manifests, or WAF rules that serve challenge pages to useful crawlers.

Fourth, connect the surfaces.

The main site should point to the docs.

The docs manifest should point to exhaustive context.

The AI page should point to builder resources.

The learning site should be linked from the right places.

The sitemap should expose the important public pages.

Fifth, measure citations manually before buying tooling.

Pick 20 real questions developers ask.

Ask them across ChatGPT, Claude, Perplexity, Gemini, and coding agents.

Record whether the answer cites official sources, stale blogs, random forum posts, or nothing.

Only after that baseline exists does a vendor dashboard become useful.

Otherwise you are buying charts before knowing the failure mode.

Sixth, keep the scope honest.

llms.txt is a map.

robots.txt is a crawler policy surface.

Structured data is a machine-readable description layer.

A sitemap is coverage.

A WAF is a gate.

Content is still the source of truth.

Confusing those roles leads to bad strategy.

What this does not claim

This essay does not claim that every major AI company uses llms.txt as a ranking signal.

There is no public evidence strong enough to say that.

It does not claim that a curated manifest guarantees citations.

It does not claim that agent discoverability is solved by SEO tricks.

It does not claim that crawler user-agent tests from one machine prove how verified bots are handled globally.

The claim is narrower and more useful:

retrieval systems work better when authoritative content is easy to identify, fetch, parse, and cite.

That claim is enough.

It points to practical work:

curate the entry map,
keep full context available,
maintain canonical metadata,
avoid accidental crawler blocks,
monitor the public surfaces,
and write pages that answer real questions.

That is not magic.

It is infrastructure.

The takeaway

A good documentation site already has most of what agents need.

It has pages.

It has sections.

It has concepts.

It has links.

It has examples.

But agents do not experience the site the same way humans do.

They need a discoverable route into the site, a canonical way to identify the right pages, enough metadata to preserve context, and enough content density to answer the question.

That is why documentation structure fits agent workloads.

The site is already a graph.

The work is to make the graph explicit.

The modern practice is not to chase one acronym.

It is to make each boundary legible:

robots.txt for access intent,
llms.txt for curated orientation,
llms-full.txt for depth,
sitemap and markdown routes for coverage,
metadata for canonical citation,
monitoring for drift,
and content for truth.

Once those pieces line up, an agent does not have to guess where authority lives.

It can find it.