← back to paper map
Pass 2 Part IV · AI-era moats · chapter 12 Evaluator Judgment Power

Evaluator Judgment Power

When generation is nearly free, the moat is the right to say which output is good enough.

1. What it is

Generation is collapsing toward free. The bottleneck moves to a different question: how do you know it's good? — and behind that, who has the right to say it is good enough? Evaluator Judgment Power is the moat held by the entity with credible standing to certify an AI-generated output against a regulatory or professional standard of care, and to be contractually accountable when that certification turns out to be wrong.

The moat reduces to a two-part test that any candidate must pass:

  1. Can the system reliably produce a result that meets the standard of care for that type of outcome in its regulatory or professional environment? — the capability question.
  2. Can it prove it has done so in a way that is marketable, or even guaranteed? — the certification question.

The first half is a software problem and trends toward commodity. The second half is the moat.

2. How it works

Four mechanics, in order. The first two are background conditions. The second two are the moat.

  1. The domain has a defined standard of care. Building code; engineering practice; standard accounting rules; medical standard of care; legal duty of competent representation; financial fiduciary duty. The standard is codified well enough that “meeting it” is testable in court or in regulator review. Without this, there is nothing to certify against.
  2. The AI system can reliably produce output that meets the standard. The capability question. In some domains it is solved; in many it is not. Where it is solved, the surface moves to certification.
  3. The right to certify against the standard is gatekept. A specific cohort of licensed individuals or accredited firms holds the legal or institutional authority to say “this meets the standard.” The cohort is bounded and slow to grow. Specialization narrows it further: the more specialized the work, the smaller the cohort that can credibly certify it.
  4. An accountable party carries the consequence. When the certification turns out to be wrong, someone is named in the lawsuit, the malpractice action, the regulator's order, the insurer's claim. That party either is the gatekeeper from step 3, or has employed the gatekeeper to ride alongside the product (the Intuit move), or has acquired vendor-side certification authority of its own (the cleared-medical-device move).

The moat is steps 3 and 4 acting together. A vendor who can produce a passing answer (steps 1–2) but cannot certify it (step 3) and will not be named in the suit (step 4) does not have an evaluator moat. They have a tool that someone else uses to produce their certified work product — valuable, sometimes very valuable, but commoditizable in a way the gatekeeper's standing is not.

3. Canonical examples

4. How it fails

5. Key insights — the AEC phase differential

The strongest pattern the framework predicts in AEC is that this moat is phase-specific. Design and construction look superficially similar — both are project-based, both produce work product, both face liability for failure — but the right-to-certify infrastructure differs structurally between them.

Design phase — the moat is native

Design has the cleanest professional-stamp regime in AEC. A licensed architect or engineer personally seals the drawings issued for permit and construction. The license carries personal liability and can be revoked by a state licensing board. The standard of care is codified in the building code, in engineering-society practice, and in case law about design negligence. All four mechanics from section 2 are present and load-bearing.

An AI design tool entering this phase faces a structural choice. Either it composes with a licensed professional in the loop — in which case the professional's stamp is the certification surface and the tool rides alongside — or it commoditizes the unstamped layer (massing, options exploration, take-offs, code-checking) and concedes the certification layer to the human. The most credible long-run plays make the licensed professional faster and more accurate without trying to remove them. The Intuit move — assembling a network of staff architects and engineers who review and stamp AI-generated work product — is the other available archetype, and one of the most generative ones in the paper.

Construction phase — the moat does not form here

There is no equivalent stamp-bearing professional cohort whose seal certifies the in-place work to the same code-bound standard of care that an architect's stamp certifies the design. General contractors and subcontractors operate on a different accountability stack — contracts, surety bonds, lien rights, liquidated damages, defect warranties — that allocates risk and creates remedies but does not concentrate the right-to-certify in a credentialed individual. The mechanics from section 2 break at step 3.

The Intuit-style manufactured-gatekeeper move is also harder to translate. Intuit could buy a network of credentialed tax professionals because the credential exists, is portable, and travels with the person. In construction there is no equivalent cohort to acquire; the closest analogues (third-party inspectors, special inspectors, plan reviewers, commissioning agents) certify narrow slices and are themselves licensed professionals on the design side of the gate, not on the construction side. Calibrated refusal in construction software remains valuable as decision support and as a workflow accelerant, but it is not the same moat. Construction-side plays should be scored on Agentic Workflow Lock-in and Data Flywheel mechanics — which they often have — rather than credited with an evaluator moat that the phase's accountability stack does not actually support.

Visual: generation cost vs. right-to-certify

Time / model capability → Cost / scarcity Generation cost (→ 0) Right-to-certify (stays scarce) The moat's wedge capability × certification × accountability Fig. 12.1 — As generation collapses to commodity, the right to certify against a standard of care stays scarce. The wedge is the moat.

Cross-references

Composes with Data Flywheel (ch. 10) — specialized judgment, written down as evals, drives the flywheel faster and more reliably — and with Agentic Workflow Lock-in (ch. 11), where the agent's accumulated calibration is itself the codified judgment. The full synthesis — including the specialization spectrum, the maturation across the three dimensions, and how the three plus go-to-market execution drive the AI-native land-grab — lives in the consolidated paper's synthesis section.

Sources: codesign-thesis claim on generation cheap, judgment scarce · FDA medical-device clearance regime and case examples (Aidoc in radiology, Cleerly in cardiac imaging) · Intuit 10-K and product disclosures on TurboTax Live and the accuracy guarantee · architectural and engineering licensure documentation · Sacra company memos on Harvey.