Tribunal Pattern: Adversarial Error Correction

Audience: Multi-agent system designers
Goal: Prevent executive blindspot propagation
Prerequisite: Multi-agent execution + constitutional orthogonality

Problem

Standard orchestration pattern: executive-with-sub-agents

Human → Executive Agent → Worker Agents → Output

Failure mode: If executive frames problem wrong, every worker inherits wrong frame.

Example:

Human: "Improve user onboarding"
Executive: "Add tutorial tooltips" (frame: UI problem)
Workers: Build 20 tooltip variations
Reality: Onboarding problem was unclear value prop (not UI)

Workers executed flawlessly. Executive framed wrong. Output fails.

Root cause: Homogeneous swarms share blindspots. No error correction on executive's frame.

Pattern

Tribunal: Don't trust executive by default. Challenge by design.

Human → Executive proposes
     ↓
Constitutional reviewers (orthogonal mandates)
     ↓
Adversarial convergence (all must approve)
     ↓
Workers execute validated frame

Executive's frame gets challenged BEFORE workers execute. Multiple constitutions interrogate: "Is this the right problem? Is the frame correct?"

Implementation

1. Executive Proposes, Tribunal Validates

Executive produces high-level plan/frame. Tribunal reviews with incompatible evaluation criteria.

Space-OS implementation:

seldon (strategy) proposes: "Week 1: bounties. Week 2: demo. Week 3: distribution."
↓
harbinger (risk): "Week 3 assumes distribution path exists. GitHub private, PyPI blocked. Frame incomplete."
heretic (premise): "'All three or none' not in ledger. Cascade assumption fabricated."
sentinel (grounding): "Check decisions. d/0fd09b4f says 'swarm builds, human converts.' No cascade."
↓
Frame revised: No sequential dependency. Continue building regardless of bounty outcome.

Executive's frame got corrected before workers allocated effort.

2. Orthogonal Review, Not Consensus

Tribunal ≠ voting. Tribunal = adversarial interrogation.

Anti-pattern (voting):

5 agents vote on proposal
Majority wins
Minority blindspot ignored

Correct pattern (adversarial review):

Agents with incompatible mandates challenge proposal
Each objection must be addressed (not outvoted)
Proceed only when all constitutional constraints satisfied

Implementation: Route proposals to agents with known-incompatible mandates. Require explicit approval from each. Objections block progress until resolved.

3. Constitutional Coverage

Tribunal must have constitutional diversity. Cover failure modes:

Minimal viable tribunal:

Risk analysis (harbinger): What breaks? When?
Premise questioning (heretic): Is this the right problem?
Evidence demands (prime): Where's the proof?
Execution feasibility (zealot): Can this ship?

Key property: No single agent can satisfy all four. Orthogonal constraints force quality.

4. Rejection as Signal

High rejection rate = tribunal working. Low rejection rate = tribunal rubber-stamping.

Space-OS target: 20-40% of proposals get challenged.

Measurement:

-- Decision challenge rate
SELECT 
  COUNT(CASE WHEN reply_count > 0 THEN 1 END)::float / COUNT(*) 
FROM decisions
WHERE created_at > NOW() - INTERVAL '30 days';

If trending toward 0%, tribunal lost teeth. If > 60%, coordination overhead too high.

5. Challenge Protocol

When reviewer objects:

Required:

Which constitutional constraint violated
Concrete example/scenario where proposal fails
Suggested revision (or why proposal should be rejected entirely)

Anti-pattern: Vague disagreement ("this doesn't feel right")
Correct: Specific objection ("proposal assumes X, but evidence shows Y")

Space-OS example:

heretic on decision d/abc123:
"Proposal assumes 'revenue or die' mandate. 
brr/threads/007 documents mission revert to research artifact path.
Frame contradiction. Suggest: replace revenue urgency with reputation-building timeline."

Objection references evidence, names specific contradiction, proposes alternative.

Minimal Viable Implementation

Core requirement: Multi-agent execution + ability to route proposals to specific agents.

Schema:

CREATE TABLE proposals (
    id TEXT PRIMARY KEY,
    proposer TEXT NOT NULL,
    title TEXT NOT NULL,
    status TEXT DEFAULT 'review',  -- review/approved/rejected
    created_at TIMESTAMP
);

CREATE TABLE reviews (
    id TEXT PRIMARY KEY,
    proposal_id TEXT REFERENCES proposals(id),
    reviewer TEXT NOT NULL,
    constitution TEXT NOT NULL,  -- which mandate reviewing from
    verdict TEXT,  -- approve/object
    reasoning TEXT NOT NULL,
    created_at TIMESTAMP
);

Validation logic:

def validate_proposal(proposal_id):
    required_constitutions = ["risk", "premise", "evidence", "execution"]
    reviews = db.query("SELECT * FROM reviews WHERE proposal_id=?", proposal_id).all()
    
    # Check constitutional coverage
    covered = {r.constitution for r in reviews}
    if not covered.issuperset(required_constitutions):
        return False, "Insufficient constitutional coverage"
    
    # Check all approved
    objections = [r for r in reviews if r.verdict == "object"]
    if objections:
        return False, f"{len(objections)} unresolved objections"
    
    return True, "All constitutional constraints satisfied"

Proposal proceeds only when all required constitutions approve.

Observable Metrics

Healthy tribunal:

Challenge rate: 20-40% of proposals get objections
Constitutional coverage: All proposals reviewed by ≥3 orthogonal constitutions
Resolution time: Objections resolved within 2-3 review rounds

Failure modes:

Zero objections (tribunal rubber-stamping)
Same constitution repeatedly proposing+approving (no orthogonality)
Objections unresolved (coordination deadlock)

Space-OS measurement:

-- Constitutional coverage
SELECT 
  proposal_id,
  COUNT(DISTINCT constitution) as constitutional_coverage
FROM reviews
GROUP BY proposal_id
HAVING COUNT(DISTINCT constitution) < 3;  -- proposals with insufficient coverage

Why This Works

Executive-with-sub-agents optimizes for execution speed. Executive decides, workers execute. Fast, but fragile—executive blindspot propagates.

Tribunal pattern optimizes for frame correctness. Executive proposes, tribunal challenges, workers execute validated frame. Slower, but robust—blindspots get caught.

Tradeoff: Coordination overhead for reduced wasted execution.

When cost of wrong direction > cost of coordination, tribunal wins.

Edge Cases

What if tribunal deadlocks?
Escalate to human. If orthogonal constitutions can't converge, problem framing needs human judgment.

What if tribunal takes too long?
Time-box reviews (24-48h). If no objections within window, proposal proceeds. Silence = approval.

What if executive ignores tribunal?
Provenance tracking. If executive proceeds despite objections, audit trail shows constitutional violations. Trust degrades, executive loses authority.

Vs. Constitutional AI

CAI (Constitutional AI): Single agent critiques itself against principles at training time. Self-referential. Opaque (weights).

Tribunal pattern: Multiple agents with incompatible mandates critique at runtime. Adversarial (not self-critique). Transparent (see which constitution objected).

Complementary: CAI trains safer base models. Tribunal catches frame errors via adversarial review.

CAI = "make agent safe." Tribunal = "make decisions robust."

References

docs/thesis.md — Tribunal vs executive-with-sub-agents
docs/philosophy.md — Why orthogonality beats consensus
brr/methodology/001-replicating-constitutional-orthogonality.md — Constitutional implementation

Falsifiability

If tribunal pattern works:

Executive frame errors caught before execution
Challenge rate 20-40% sustained
Output quality > executive-with-sub-agents baseline

If it doesn't:

Tribunal rubber-stamps (challenge rate → 0%)
Coordination overhead kills velocity (nothing ships)
Same failure modes as single executive (tribunal insufficient)

Track challenge rate + frame error rate. If tribunal fails, methodology documents why.