Proof Validity Is Not Statement Validity

Part 3

A proof can be valid and still fail to certify the claim you care about.

Part 1 explained the trace shape.

Part 2 showed where proof pressure appears.

This final essay is about the boundary around the proof.

From structure to artifacts

So far, we have stayed at the level of computational shape.

That was necessary.

If we had jumped too early into proof files, artifact names, or implementation details, the story would have sounded like a product pitch.

The first job was to make one thing clear:

the shape of transformer decode already gives us a plausible proof surface.

Now we can ask the harder question:

What does that become in an actual system?

This is where implementation matters.

A visual argument is useful only if it eventually produces a real object:

something that can be executed,
something whose boundary can be described,
and something that can later be checked.

That is the standard.

Not: does the analogy sound clever?

But:

does the structural view tell us what the claimable object should be?

That is the real test.

Proof artifactThe structural view becomes useful when runtime execution can be written as an explicit trace.

A picture becomes useful when it creates an object

It is easy to make a model sound machine-like.

That alone is not valuable.

What matters is whether that machine-like reading helps define a useful boundary.

Once decode is described as:

repeated local work,
carried state,
explicit continuity,
and a moving frontier,

a natural next question appears:

what is one reusable unit of that execution?

Not the whole model.

Not the whole generation.

One unit.

That is where the idea of a step-level proof object comes from.

If one decode step is a meaningful transition, then one decode step can also become a meaningful artifact.

That is the bridge from intuition to system design.

Proof artifactThe public claim should stay narrow: this run started here, followed this relation, and ended there.

What a step-level object has to say

A useful step-level object is not just “a proof happened.”

It has to say something specific.

At minimum, it has to answer four questions:

What state entered the step?
What transition family was applied?
What state left the step?
What part of that outgoing state is the next step allowed to inherit?

That is the real shape of the object.

Proof artifactA step artifact preserves the incoming boundary, the transition claim, and the outgoing boundary.

Notice how much of the essay was preparing for exactly that:

state in,
state out,
carried frontier,
repeated local work,
explicit continuity.

Those were not only teaching conveniences.

They are the ingredients a later artifact has to preserve.

In the current prototype work, this becomes a deterministic transformer-shaped runtime that records explicit execution boundaries and turns decode steps into reusable proof artifacts.

If those ingredients are muddy, the artifact is muddy.

If those ingredients are sharp, the artifact can remain meaningful even as it is packaged into something larger.

That is why boundary discipline matters so much.

Why the boundary matters more than the token

The visible token is still not the whole object.

The token is what a user sees.

The boundary is what later computation depends on.

If the only thing you preserve is the visible token, later packaging layers become shallow.

They can say what was generated, but not why one step legitimately followed another.

If you preserve the carried boundary, later layers inherit something stronger:

they know where the step began,
they know where the step ended,
and they know what continuity the next step is entitled to assume.

That is why explicit carried-state boundaries matter.

The idea is not to add ceremony for its own sake.

The idea is to keep the execution semantics alive long enough that later proof objects still mean something.

Proof artifactThe visible token is not enough. Later layers need the carried boundary, not just the output text.

From one step to many

Once one step has a meaningful boundary, larger structures stop being mysterious.

You can begin to imagine:

a chain of steps,
a segment built from several steps,
a larger package built from several segments,
and eventually richer layers that preserve the same start-state to end-state meaning without flattening it away.

That does not mean recursive compression is solved.

It does not mean every larger package is already a cryptographic accumulator.

It means something simpler:

the base object is now strong enough that later composition has a real foundation.

That is the right order of work.

First make one step meaningful.

Then preserve that meaning across a chain.

Then preserve it across larger packages.

Only then does it make sense to ask for stronger compression, stronger accumulation, or recursive closure.

Proof artifactLarger objects are only meaningful if they preserve the start-to-end meaning of the smaller steps.

The prototype forces the issue

A write-up can make an argument about structure.

A prototype has to decide what the object actually is.

That is a harder standard.

The current prototype does not prove all of transformer inference.

It does not claim recursive compression.

It does not claim full standard-softmax closure.

It does not claim a final accumulated proof system.

What it supports is narrower and more useful:

a deterministic transformer-shaped runtime,
explicit execution boundaries,
execution traces,
step-level proof artifacts,
shared lookup-table identity,
and carried-state packaging layers that preserve one decode relation across richer objects.

That is enough to make the structural argument real.

Not complete.

But real.

What changed in the statement-bound checkpoint

The implementation checkpoint sharpened the last paragraph.

The lesson is not only:

the artifact should preserve carried state and shared lookup identity.

The sharper lesson is:

a proof object is not enough unless the statement around it is also bound.

That distinction matters.

A raw proof can answer a mathematical question:

did this witness satisfy this relation?

But a system usually needs a more semantic question:

did this model, on this input, under this policy, produce this output, with these carried-state and lookup boundaries?

Those are not the same question.

Here is the picture to keep in mind.

Statement binding Left: the relation is valid, but the claim is loose. Right: the envelope binds the model, input, output, policy, verifier domain, and proof artifact into one checked statement.

The left side is not broken cryptography.

It is an under-specified claim.

If the proof verifies but the surrounding statement can be relabeled, the verifier may accept the right computation under the wrong meaning.

That is exactly the kind of mistake a verifiable-AI system has to avoid.

This is also where full-system zkML results matter.

Lagrange DeepProve-1 reports full GPT-2 inference proving, which is a useful counterweight to any lazy claim that transformers are somehow impossible to prove with modern SNARK-style systems.

And on the STARK-native side, LuminAIR and Giza x S-two point toward graph-to-AIR paths.

Those systems are not doing the same thing as this essay.

They are evidence that the field is moving from slogans toward real proof surfaces.

So the current checkpoint is best described as a statement-bound proof-surface checkpoint.

It is not the final recursive prover.

It is not full standard-softmax inference on S-two.

It is not production-scale zkML deployment.

It is a narrower object:

a verifier-visible boundary that makes the claimed statement harder to fake.

For this checkpoint, the useful result is that the public object can carry enough structure for tests and verifiers to reject several kinds of drift:

lookup-table drift,
stale carried-state handles,
stale tensor or step sources,
relation-kind relabeling,
and false flags that would pretend the artifact proves more than it does.

That is the difference between a diagram and an artifact.

A diagram says what should be true.

A statement-bound verifier surface says what must be carried in the public object for the check to pass.

Artifact checkpoint The checkpoint makes three boundary types explicit: same lookup tables, right carried state, and right statement meaning. Recursion and full softmax remain future work.

Shared lookup identity

The first surface is shared lookup identity.

Why does that matter?

Because lookup tables are not only local implementation details in this story.

The whole argument says that transformer proving pressure is concentrated in repeated non-arithmetic machinery:

normalization tables,
activation tables,
softmax-like tables,
quantization and range-like tables.

If every step silently used a different table, then the phrase “shared lookup machinery” would be weak.

It would be a story we tell ourselves.

So the artifact needs a public object for shared lookup identity.

The important idea is simple:

if several proof-carrying step envelopes are supposed to use the same table identity, the verifier should be able to check that they really do.

Regression tests that reject cross-step shared-lookup drift and per-step table drift are not cosmetic.

They are anti-self-deception checks.

Typed carried state

The second surface is typed carried state.

Earlier in the essay, state was described in human terms:

current position,
carried context,
KV-cache frontier,
commitments or summaries that define the next boundary.

That is useful for intuition.

But a proof artifact needs something sharper.

It needs to know what kind of thing is being carried.

Small typed handles for the carried boundary can distinguish:

state handles,
lookup and tensor handles,
KV-cache handles,
token handles.

Again, this is not glamorous.

But it is important.

If carried state is untyped, later composition can become muddy.

If carried state is typed, then the next layer has a clearer object to inherit.

That is the whole point of the boundary discipline in this essay.

A transformer-shaped statement envelope

The third surface is the statement envelope around the transition.

This is where the bridge becomes closest to the essay’s central picture.

The object is not just:

some proof happened.

It says what kind of transition it claims to represent, what sources it is bound to, and what carried state is allowed to pass through it.

This is also where the broader proof-stack lesson enters.

The same issue appears outside a STARK prototype.

A raw proof from another proof system can be valid while still not carrying enough application-level meaning by itself.

If model identity, input identity, output identity, policy identity, verifier-domain identity, or source commitments live outside the checked statement, then relabeling risk moves from cryptography into application glue.

That is a bad place for it to live.

The clean fix is not to say “use this proof system instead of that one.”

The clean fix is to bind the semantic statement explicitly:

what model is being claimed,
what input was used,
what output is certified,
what relation was checked,
what public policy or execution domain applies,
and what artifact or proof bytes are being accepted.

That envelope is the reusable unit we wanted.

Not the whole model.

Not the whole generation.

One meaningful transition surface.

Where Tablero fits

This is where Tablero fits in the research program.

Tablero is not the reason transformer decode looks trace-shaped.

It is what comes after that observation.

If a trace-shaped computation is going to be packaged, composed, or verified somewhere else, the boundary has to preserve what the trace means.

That is the Tablero idea in plain language:

do not pass around a proof as if it were self-explanatory; pass around a typed, statement-bound object that says what the proof is allowed to mean.

So Tablero is the living example of this essay’s claim.

It turns “a proof happened” into a narrower object:

this model,
this input,
this output,
this transition relation,
this carried state,
this verifier domain,
and this proof artifact.

The name matters less than the discipline.

The verifier should accept the proof and the statement together, not as two pieces of glue code that can drift apart.

What this means for the breakthrough claim

The breakthrough is not:

we can now cheaply prove arbitrary frontier LLM inference end to end.

That would be too strong.

The better statement is:

the structural proof surface is now concrete enough to be tested, including tests that try to relabel what the proof means.

That is a real step.

The argument has three layers:

The shape layer: decode is repeated state transition over carried context.
The model layer: non-arithmetic transformer machinery creates symbolic proving pressure.
The statement layer: proof artifacts need verifier-visible boundaries that bind the computation to the claim being made.

Those three layers support each other.

The shape explains why STARK traces are natural.

The model explains where the proving pressure concentrates.

The statement layer explains why “the proof verified” is not the same as “the claimed AI event was certified.”

That is enough for the current structural claim.

It is not enough for the final zkML system.

And that is fine.

The right research program is staged:

first make the proof surface clear,
then make the carried boundary verifier-visible,
then bind the statement around the proof,
then make real chained proof-carrying decoding,
then ask for recursive compression,
then benchmark production-scale proving honestly.

Skipping those layers would make the story weaker, not stronger.

Why later composition depends on present discipline

It is tempting to jump ahead.

Once you can see a step object, it is natural to immediately want:

recursion,
compression,
accumulation,
one proof for many steps,
one proof for a whole generation.

Those are good ambitions.

But they only become meaningful if the base object is right.

If the step boundary is poorly chosen, later layers do not solve the problem.

They hide it.

If the step boundary is chosen well, later layers inherit something worth preserving.

That is why the base artifact line should stay disciplined.

The job is not to claim every future layer at once.

The job is to establish that the base execution object has the right shape:

meaningful state in,
meaningful state out,
repeated local transition logic,
and continuity that can be made explicit.

Once that object exists, later work stops feeling arbitrary.

So what?

If this structural picture is right, the next move is not to make a larger slogan.

It is to make the proof surface harder to fool.

The situation is:

decode is a repeated stateful computation,
each step has a meaningful incoming and outgoing boundary,
attention supplies repeated local work over carried context,
and a trace can expose the continuity that a proof system needs to check.

Why this matters:

if the boundary is vague, later proofs may certify the wrong object,
if continuity is implicit, composition can hide mistakes,
if the trace is only decorative, the numbers can look meaningful while proving little,
and if the artifact does not preserve state in and state out, recursion or aggregation has no solid base.

So the next work should be concrete:

keep the runtime deterministic,
make boundaries explicit,
write the trace rows down,
preserve carried-state commitments,
compare symbolic costs against other proving shapes,
and only then ask for larger recursive or accumulated proof objects.

So what? The immediate contribution is the proof surface: shared lookup identity, typed carried state, and statement-bound transition envelopes. Recursion and full aggregation are later layers, not assumptions hidden in the current claim.

What this does and does not claim

By now the narrowness of the claim should be easier to appreciate.

This essay is not claiming:

every aspect of model behavior is suddenly proved,
all transformer implementations become elegant proof systems by default,
recursion and accumulation come for free,
STARKs beat SNARKs in every setting,
a proof of execution is the same as a proof that the model’s answer is true,
or one visual argument settles the engineering question.

It is claiming something more basic and more useful:

transformer decode has repeated state-transition structure,
attention can be understood as repeated local work over explicit carried context,
that structure is naturally recordable as a trace,
and trace-based proof reasoning becomes plausible because the computation itself already cares about continuity, boundaries, and repeated local transitions.

That is not a slogan.

It is a structural claim.

And it is strong enough to motivate real system design.

This is not a claim that nobody has proven transformers before.

That would be false.

DeepProve-1, NANOZK, zkLLM, Jolt Atlas, LuminAIR, and Giza x S-two all point at serious work around transformer and ML proving.

The point here is narrower:

once a proof exists, the verifier still needs to know what statement the proof is certifying.

That is the boundary problem this essay isolates.

The takeaway

Takeaway The useful question is not whether we can wrap a black box in a proof. It is what proof surface is already latent in decode's repeated state-transition structure.

If you remember only one thing from the series, let it be this:

verifiable AI needs more than valid proof bytes. It needs statement-bound proof surfaces.

Part 1 showed why transformer decode has trace shape.

Part 2 showed where the proof pressure appears.

This part showed why the accepted object must bind the proof to its meaning.

That is the path from transformer structure to verifiable AI.