1. What it is
Generation is collapsing toward free. The bottleneck moves to a different question: how do you know it's good? — and behind that, who has the right to say it is good enough? Evaluator Judgment Power is the moat held by the entity with credible standing to certify an AI-generated output against a regulatory or professional standard of care, and to be contractually accountable when that certification turns out to be wrong.
The moat reduces to a two-part test that any candidate must pass:
- Can the system reliably produce a result that meets the standard of care for that type of outcome in its regulatory or professional environment? — the capability question.
- Can it prove it has done so in a way that is marketable, or even guaranteed? — the certification question.
The first half is a software problem and trends toward commodity. The second half is the moat.
2. How it works
Four mechanics, in order. The first two are background conditions. The second two are the moat.
- The domain has a defined standard of care. Building code; engineering practice; standard accounting rules; medical standard of care; legal duty of competent representation; financial fiduciary duty. The standard is codified well enough that “meeting it” is testable in court or in regulator review. Without this, there is nothing to certify against.
- The AI system can reliably produce output that meets the standard. The capability question. In some domains it is solved; in many it is not. Where it is solved, the surface moves to certification.
- The right to certify against the standard is gatekept. A specific cohort of licensed individuals or accredited firms holds the legal or institutional authority to say “this meets the standard.” The cohort is bounded and slow to grow. Specialization narrows it further: the more specialized the work, the smaller the cohort that can credibly certify it.
- An accountable party carries the consequence. When the certification turns out to be wrong, someone is named in the lawsuit, the malpractice action, the regulator's order, the insurer's claim. That party either is the gatekeeper from step 3, or has employed the gatekeeper to ride alongside the product (the Intuit move), or has acquired vendor-side certification authority of its own (the cleared-medical-device move).
The moat is steps 3 and 4 acting together. A vendor who can produce a passing answer (steps 1–2) but cannot certify it (step 3) and will not be named in the suit (step 4) does not have an evaluator moat. They have a tool that someone else uses to produce their certified work product — valuable, sometimes very valuable, but commoditizable in a way the gatekeeper's standing is not.
3. Canonical examples
- Licensed architects and engineers — the native individual-gatekeeper case. A licensed architect or engineer personally seals the drawings issued for permit and construction. The license is held by the individual, carries personal liability, and can be revoked. The standard of care is codified in the building code and in case law about design negligence. AI design tools cannot ship a stamped drawing without one of these professionals in the loop.
- Physicians and FDA-cleared SaMD vendors — the medical analog. The licensed physician carries the standard-of-care accountability at the point of care; FDA-cleared software-as-medical-device vendors (Aidoc in radiology, Cleerly in cardiac imaging) carry vendor-side certification authority for narrow indications. Two species of evaluator coexist in the same workflow because the right to certify has been split between operator-license and vendor-clearance regimes. Note what is not on this list: the EHR platforms themselves (Epic, Athenahealth, Cerner). Their moat is compliance-driven vendor lock-in — ONC certification, HIPAA boundaries, payer interfaces, implementation cost — which is a Switching Costs play (chapter 04) amplified by regulation, not an Evaluator Judgment play. The platform hosts the gatekeepers; it does not hold the right to certify.
- Intuit — the manufactured-gatekeeper case. The TurboTax software was fine; the calculation engine worked. What it lacked, against an audit-grade standard of care, was the institutional credibility to stand behind a return. Intuit solved this by buying it — building TurboTax Live, a network of credentialed tax professionals whose review and stamp now back the software output. The accuracy guarantee on the box is the marketable surface; the network of certifiers is the moat.
- Harvey (legal AI) — the specialized-AI case riding an existing gatekeeper. The bet is that the calibrated answer Harvey produces, paired with the lawyer's own license and review, becomes guarantee-able to the client in a way no general assistant can match. The licensed lawyer remains the gatekeeper. Harvey is engineered to make that gatekeeper's certification fast, cheap, and consistent — and to encode the judgment into evals that compound across the firm.
4. How it fails
- Liability does not transfer to the vendor. The customer keeps the downstream risk; the vendor cannot price share-of-savings or insurance-linked offerings. Pricing collapses to per-seat and the surface looks like Brand rather than Evaluator.
- Licensure regime broadens or commoditizes. Reform expands the cohort with the right to certify, or automated processes reduce certification to a benchmark any compliant vendor passes. The right ceases to be scarce.
- Foundation-layer absorption from below. A frontier model reaches a quality level where the customer is willing to accept the residual risk rather than pay for stamped certification. Pressure is strongest at the general end of the specialization spectrum and weakest at the specialized end.
- Brand-only collapse. The vendor stays trusted but loses calibration depth as the underlying model commoditizes; competitors with similar institutional standing and better calibration eat the segment from inside the same gatekeeping cohort.
- Cohort capture by a competitor. A rival firm hires or acquires a more credible network of gatekeepers, and the moat migrates with them. The Intuit-Live model is replicable in principle; its scale is the defense.
5. Key insights — the AEC phase differential
The strongest pattern the framework predicts in AEC is that this moat is phase-specific. Design and construction look superficially similar — both are project-based, both produce work product, both face liability for failure — but the right-to-certify infrastructure differs structurally between them.
Design phase — the moat is native
Design has the cleanest professional-stamp regime in AEC. A licensed architect or engineer personally seals the drawings issued for permit and construction. The license carries personal liability and can be revoked by a state licensing board. The standard of care is codified in the building code, in engineering-society practice, and in case law about design negligence. All four mechanics from section 2 are present and load-bearing.
An AI design tool entering this phase faces a structural choice. Either it composes with a licensed professional in the loop — in which case the professional's stamp is the certification surface and the tool rides alongside — or it commoditizes the unstamped layer (massing, options exploration, take-offs, code-checking) and concedes the certification layer to the human. The most credible long-run plays make the licensed professional faster and more accurate without trying to remove them. The Intuit move — assembling a network of staff architects and engineers who review and stamp AI-generated work product — is the other available archetype, and one of the most generative ones in the paper.
Construction phase — the moat does not form here
There is no equivalent stamp-bearing professional cohort whose seal certifies the in-place work to the same code-bound standard of care that an architect's stamp certifies the design. General contractors and subcontractors operate on a different accountability stack — contracts, surety bonds, lien rights, liquidated damages, defect warranties — that allocates risk and creates remedies but does not concentrate the right-to-certify in a credentialed individual. The mechanics from section 2 break at step 3.
The Intuit-style manufactured-gatekeeper move is also harder to translate. Intuit could buy a network of credentialed tax professionals because the credential exists, is portable, and travels with the person. In construction there is no equivalent cohort to acquire; the closest analogues (third-party inspectors, special inspectors, plan reviewers, commissioning agents) certify narrow slices and are themselves licensed professionals on the design side of the gate, not on the construction side. Calibrated refusal in construction software remains valuable as decision support and as a workflow accelerant, but it is not the same moat. Construction-side plays should be scored on Agentic Workflow Lock-in and Data Flywheel mechanics — which they often have — rather than credited with an evaluator moat that the phase's accountability stack does not actually support.
Visual: generation cost vs. right-to-certify
Cross-references
Composes with Data Flywheel (ch. 10) — specialized judgment, written down as evals, drives the flywheel faster and more reliably — and with Agentic Workflow Lock-in (ch. 11), where the agent's accumulated calibration is itself the codified judgment. The full synthesis — including the specialization spectrum, the maturation across the three dimensions, and how the three plus go-to-market execution drive the AI-native land-grab — lives in the consolidated paper's synthesis section.
Sources: codesign-thesis claim on generation cheap, judgment scarce · FDA medical-device clearance regime and case examples (Aidoc in radiology, Cleerly in cardiac imaging) · Intuit 10-K and product disclosures on TurboTax Live and the accuracy guarantee · architectural and engineering licensure documentation · Sacra company memos on Harvey.