# Human-in-the-Loop Requirements

Framework for assessing whether HITL controls are genuine or governance theatre.

---

## The HITL Spectrum

| Level | Description | Genuine Oversight? |
|:------|:------------|:-------------------|
| **Rubber stamp** | Human clicks "approve" on every AI output | No - performative only |
| **Spot check** | Random sample reviewed periodically | Partial - depends on sample rate and quality |
| **Exception review** | Human reviews AI-flagged exceptions | Partial - depends on flag quality |
| **Active review** | Human assesses every AI output before action | Yes - if time and training are adequate |
| **Collaborative** | Human and AI work together, complementary strengths | Yes - if human can genuinely modify |

---

## Five Tests for Genuine HITL

### 1. Information Test

Does the reviewer have enough information to make an independent assessment?

| Indicator | Genuine | Theatre |
|:----------|:--------|:--------|
| Full context provided | Reviewer sees all relevant data | Reviewer sees only AI recommendation |
| Original data accessible | Can examine source material | Only sees AI summary |
| Explanation provided | AI reasoning is transparent | Black box output |

### 2. Time Test

Does the reviewer have adequate time?

| Indicator | Genuine | Theatre |
|:----------|:--------|:--------|
| Review time per case | Sufficient for proper assessment | Seconds per item |
| Workload | Manageable volume | Hundreds per day |
| Incentives | Rewarded for quality | Rewarded for throughput |

### 3. Competence Test

Is the reviewer qualified to assess?

| Indicator | Genuine | Theatre |
|:----------|:--------|:--------|
| Training | Specific to AI review role | General or none |
| Domain expertise | Understands the subject matter | Administrative role only |
| AI literacy | Understands AI limitations | Treats AI as infallible |

### 4. Authority Test

Can the reviewer actually change the outcome?

| Indicator | Genuine | Theatre |
|:----------|:--------|:--------|
| Override rate | 2-15% (reasonable disagreement) | <0.5% (rubber stamping) or >30% (AI too poor) |
| Override friction | Easy to override | Complex process, manager approval needed |
| Override consequences | None for reasonable disagreement | Pressure to align with AI |

### 5. Feedback Test

Does human input improve the system?

| Indicator | Genuine | Theatre |
|:----------|:--------|:--------|
| Feedback captured | Overrides and reasons recorded | Not tracked |
| Model updates | Human feedback improves AI | Feedback goes nowhere |
| Loop closure | Reviewer sees impact of feedback | No communication back |

---

## Red Flags for Boards

- Approval rate exceeds 99%: almost certainly rubber-stamping
- No disagreements recorded: suggests reviews are not genuine
- Review time under 30 seconds per case: insufficient for meaningful assessment
- No training for reviewers: competence cannot be assumed
- Override requires manager approval: creates pressure to conform
- No feedback loop: human input is wasted

---

## Board Challenge Questions

1. "What is the approval/override rate for human reviewers?"
2. "How long does each review take on average?"
3. "What training do reviewers receive?"
4. "Can reviewers easily override the AI, and what happens when they do?"
5. "Show me a case where a human reviewer changed an AI decision. What happened next?"

---

*HITL Requirements | NED AI Helper | Prosper AI Consulting, UK*
