2026-04-29 · MANTRA · BY RICO ALLEN

Test what we fly. Fly what we test.

The phrase comes from aerospace flight test. It means the airplane that goes through certification is the airplane that goes into service — same configuration, same software, same instrumentation, same sensors. Anything different is a different airplane. We adopted it as Hardseal's operating mantra, and it shapes most of the decisions the company refuses to make.

The version most software teams run

Most teams ship a build. Then they ship a "production" build that is subtly different — different config flags, different feature toggles, different telemetry endpoints, different mock services in the test harness vs. the live system. The differences are usually small. Sometimes the differences are the entire reason the customer is buying the product.

The differences are also where bugs live. A test passes against a mock database that returns rows in insertion order. Production hits a real database where row order is undefined. The test never caught it. The customer does.

This pattern is so common it is invisible. Engineers do not talk about it because it is normal. We refuse to treat it as normal.

What the rule actually demands

"Test what we fly. Fly what we test" forces four things, and once you start enforcing them most of the convenient shortcuts in software disappear:

No test-only mocks for behavior the customer depends on. If the production code path hits a real database, the test hits a real database. If production hashes a binary, the test hashes the same binary. Mocks are allowed for I/O speed in unit-level tests, but never for the behavior the customer is buying.
No production-only paths. The code that runs in production is the code that has tests. If a "fast path" exists only in production because it is "too expensive to test," that path has no tests. We delete the fast path or we test it. We do not ship untested code paths.
No demo scripts that bypass guardrails. If a demo script needs HARDSEAL_DISABLE_FCA_FIREWALL=1 to run, the demo is showing a system that does not exist. We refuse to demo systems that do not exist. The demo runs through the same guardrails the production system runs through, or we change the demo.
Doc examples that are exercised, not narrated. A README that shows hardseal verify packet.json better hit the same binary, with the same arguments, that a real customer hits. If the README diverges from the CLI, the README is shown how to run a system that does not exist. We test our own docs.

What this costs

A lot of convenience. Tests get slower because they hit real systems. Demos get harder to set up because they cannot bypass guardrails. Refactors take longer because every change has to roll through the full path, not just the fast path. Onboarding a new engineer is harder because there is no "magic mode" that skips the friction the production system has.

Most companies do the math, decide the cost is too high, and ship the convenience version. They are not wrong about the cost. They are wrong about what they are buying with the cost.

What this pays

For a company shipping cryptographic evidence to defense customers, the cost is the price of credibility. When a NIAP-class assessor or a defense prime asks does the verifier the customer runs match the verifier the test suite runs against, the answer better be "byte-identical." If the answer is "mostly identical," the conversation is over. Mostly is not a thing in this niche.

The Hardseal browser verifier on the public site is a working example of this discipline. The JavaScript implementation reproduces the canonical-JSON serialization, the seed-hash construction, the per-section chaining, and the banned-phrase scan that the Python verifier in the repo does. Same packet, same chain root, same pass/fail, three different surfaces. We did the work because we shipped the claim. The claim and the work cannot drift.

// THE TEST Pick the headline claim of any product you use. Trace it back through their codebase. Is the test that exercises that claim hitting the same path the customer hits, with the same data shapes, the same guardrails, and the same dependencies? Most of the time the answer is no. That gap is where the next failure lives.

Failure modes the rule prevents

A non-exhaustive list of bugs we have caught (or seen other teams ship) because the test diverged from production:

Mocked auth that always returns "valid," masking a bug in the real auth path. Customer hits the real path the day the cert rotates and discovers the rotation handler was never tested.
Mocked database that returns rows in insertion order. Production hits a real query planner. Tests pass for a year. A schema change rewrites the query plan. The contract that depended on insertion order silently breaks. Customer-side data corruption.
Demo flag that disables rate limits. Customer reproduces the demo workflow at scale. The rate limiter was never tested under that load. System falls over.
Pretty-printed JSON in tests, compact JSON in production. Hash chain anchored to canonical JSON. Tests produce the wrong hash. Customer's verifier fails. Tests still pass. The bug is in the test, not the code, and the bug shipped.
Doc examples that drifted. Customer pastes a command from the README. The flag was renamed three releases ago. The doc was never re-run.

How we enforce it

Pre-commit hooks scan for banned-phrase strings, FCA-flank language, and a single-author commit-email rule. The same scanner runs in CI on every PR. The same scanner runs at packet-construction time in the live verifier. Three surfaces, one implementation. If the implementation is wrong, all three surfaces are wrong, and we find out at commit time, not at the customer site.

Every customer-facing artifact — the trophy case bundle, the standalone verifier, the QUICKSTART, the sample packets — is generated by the same scripts that the team uses internally. There is no "release build" that is different from the internal build. There is no "marketing-mode" verifier. There is one verifier. It is the one we fly. It is the one we test.

The deeper bet

This rule is not actually about testing. It is about whether the company can be trusted by a buyer who cannot watch every line of code. The buyer's bet — and they are betting real money against real liability — is that what we showed them is what they will receive. The rule is the structural commitment that that bet pays.

A team that runs tests against mocks and ships against real systems is asking the buyer to trust that the gap between mock and real is small enough not to matter. For a CMMC artifact under FAR 52.204-21 retention, or an AI runtime evidence packet that might end up in a defense-prime's audit, that gap is not a question of taste. It is a question of fraud exposure. The rule is how we keep that exposure at zero.

Adopt this even if you do not ship cryptographic evidence

Every team that depends on the alignment between what was tested and what was shipped benefits from this rule. The alignment is also rare, because it is expensive. Most software is shipped on the bet that the divergence is manageable. Sometimes the bet pays. Sometimes the bet is the entire reason a customer leaves.

The version of this rule that any team can adopt today: delete one mock that abstracts customer-visible behavior. Replace it with a real implementation. Watch what tests break. Fix them properly. Repeat next month with a different mock. Twelve months in, the codebase is a different shape.

// THE MANTRA

Test what we fly. Fly what we test. The artifact that goes through CI is the artifact that goes to the customer. The path that has tests is the path that runs in production. The verifier that is exercised is the verifier that is shipped. No exceptions, no demo flags, no fast paths, no mocks for the headline claim.

← The Hardseal Learning Loop Hardseal as the evidence interface →