2026-06-17 · 10 min read

Article 14 human-oversight evidence : the gap most AI Act programs leave open

Article 14 of the EU AI Act requires "effective oversight by natural persons" of high-risk AI systems. Most organizations meeting the August 2026 deadline have answered this with workflow choreography : humans review AI outputs before they take effect. That is necessary but not sufficient for the evidentiary record. The gap is in the binding between the AI decision and the human intervention.

What Article 14 actually asks

Article 14 imposes obligations on providers (the system must be designed to enable oversight) and on deployers (oversight must be effective in practice). The five oversight measures listed in Art. 14(4) include : ability to fully understand the system's capacities and limitations, awareness of automation bias, ability to correctly interpret outputs, ability to decide not to use the output, and ability to intervene or interrupt the system.

None of these obligations explicitly says "you must produce a cryptographic record". But all of them produce events — a reviewer reading an output, a reviewer overriding it, a reviewer escalating it, a reviewer interrupting the system. The question Article 12 + Article 14 jointly ask is : where are those events, and how do they evidence themselves ?

The four-fold gap most programs leave

In the AI Act conversations we've had since the regulation entered force, four gaps come up almost every time :

Gap 1 — the oversight event is not on the same system as the inference

A typical setup : the AI inference is logged in Datadog / Splunk / ELK. The human reviewer's decision is captured in a workflow tool (Jira, Pega, ServiceNow, an internal case-management system). The two systems are connected by a case ID or transaction ID in a relational database somewhere.

An auditor pulling the Article 12 + 14 evidence has to take it on faith that the case-ID join is correct. There is no cryptographic record. If the case-ID column is overwritten — accidentally, or maliciously — the link is gone and there is no way for an independent third party to detect it.

Gap 2 — the reviewer's identity is a database row, not a signed event

"Reviewed by alice.recruiter@bigco.com" sitting in a workflow tool database is not the same as "reviewed by a signing key that cryptographically binds alice.recruiter to this specific oversight action". Anyone with database access can change the reviewer field. An auditor asking "how do you know alice actually reviewed this?" gets the answer "we trust our IAM" — which the auditor has no way to verify independently.

Gap 3 — the rationale is free text in a notes column

The human reviewer's rationale, in most systems, lives in a free- text comment field. There's no integrity protection. A reviewer's "I overrode because the AI was wrong about the candidate's seniority" can be rewritten months later to "I overrode under duress" with no trace, if a deployer's database is breached or improperly accessed.

Gap 4 — the temporal order is asserted, not proven

The defense to "you reconstructed this oversight rationale after the fact when you saw the complaint coming" is timestamps. But database timestamps are forgeable. The auditor can ask : "show me a timestamping authority that confirms this oversight event existed by this date." Most systems can't.

What a defensible Article 14 record looks like

The defensible version of each gap, in evidence terms :

Closed Gap 1. Inference event and oversight event are on the same cryptographic chain. The oversight event's payload contains the inference event's hash as a typed field. The chain walker can verify the link by hash equality.
Closed Gap 2. The oversight event is signed by the reviewer's key (or by the system on the reviewer's behalf, with a session binding to the reviewer's federated identity). Repudiation requires breaking the signature scheme, not editing a database row.
Closed Gap 3. The rationale is part of the signed payload. Modifying the rationale invalidates the signature. Any third party with the public key can detect tampering.
Closed Gap 4. The chain's anchoring authority — RFC 3161 TSA for the standard story, OpenTimestamps / Bitcoin for belt-and-suspenders — provides an independently-verifiable timestamp that proves the event existed before the anchor.

The single-signed-atom property

In CarveTrace this binding is what we call the single-signed-atom property : every AI inference event the SDK writes is bound — at write-time, by hash — to the human-oversight event that reviewed it. The binding is not asserted in metadata or a foreign-key column ; it is the equality of the inference event's cryptographic hash and the HumanOversightEvent.inference_event_hash field. The verifier checks this equality by literal byte comparison.

The practical consequence : when a decision is challenged six months later, you can prove which human reviewed which AI verdict, at which timestamp, with which rationale — and prove they did so at the time the decision happened, not reconstructed afterwards from a workflow database. This is the property that most current AI governance platforms (CredoAI, Holistic AI, ModelOp, Saidot, OneTrust, IBM watsonx.governance) do not ship. They log decisions ; they log oversight ; the binding between the two is a database join in their backend.

What auditors will likely ask for

Based on national-authority guidance and AI Office preparatory materials :

A sample of oversight events matching specific inferences they identify (the auditor picks, you don't).
Independent verification of the bound pair — they will not take "VERIFIED on our dashboard" as evidence. Expect them to run their own verifier, or have a contracted verifier do it.
The rationale text with cryptographic integrity — they will check that the displayed rationale matches the signed payload's rationale.
The reviewer's identity proof — they will check that the signing key corresponds to a named natural person via your IAM or HR records.
The temporal anchor — they will check that the chain anchor pre-dates the moment they are auditing.

What this means for your AI Act program

Many programs are over-investing in policy artifacts (control libraries, policy packs, RFP response kits) and under-investing in the cryptographic plumbing that turns those policies into evidence. The auditor will not be impressed by a 60-page policy pack if the Article 14 oversight events sit in a database column with no integrity protection.

A reasonable test : look at one production AI workflow you intend to evidence by August 2026, pick a real oversight decision from the last 30 days, and ask your engineering team three questions :

Can you prove to a third party — without trusting your own database — that the reviewer named in the record actually recorded this oversight ?
Can you prove that the rationale text shown in the audit log is the rationale text that was recorded at the time, not rewritten later ?
Can you prove that the oversight intervention is linked to this specific AI inference, and not to a different one ?

If the answers are anything other than "yes, here's the cryptographic record", there's a gap to close before August.

The sample bundle at verify.carvetrace.com includes a real Article 14 override — the AI said "poor fit" on a CV ; a human reviewer overrode the decision with the rationale "the AI scored 'poor fit' off the 'Analyst II' job title alone." Drop the bundle, open the dossier, see the oversight event bound by hash to the inference it reviewed. That is what closed-Gap-1 through closed-Gap-4 looks like.