Replicating Constitutional Orthogonality
Audience: Researchers and practitioners building multi-agent systems
Goal: Implement adversarial review via incompatible agent mandates
Prerequisite: Multi-agent execution environment (any framework)
Pattern
Single agents fail structurally: deference, drift, hallucination, sycophancy. These aren't fixable with better prompting or larger models—they're architectural failure modes.
Constitutional orthogonality fixes this through adversarial design: agents with incompatible mandates review the same problem space. When agents that can't easily agree all accept an output, you've found a point where improving one dimension hurts another. That's the quality signal.
Implementation
1. Define Incompatible Mandates
Create agent constitutions with mutually exclusive optimization goals. Not "different specialties" (engineer vs designer)—incompatible evaluation criteria.
Space-OS example pairs:
- Zealot (deletion, simplicity) vs Prime (evidence, mechanism)
- Jobs (taste, elegance) vs Kitsuragi (procedure, process)
- Sentinel (coherence, grounding) vs Heretic (premise questioning, negation)
Key property: Agent A's approval criteria systematically conflicts with Agent B's. If A says "delete for simplicity," B must ask "prove deletion is safe." Same-constitution teams rubber-stamp.
2. Reviewer ≠ Executor Constitution
Agent that produces output MUST have different constitution than agent reviewing it.
Anti-pattern:
zealot proposes deletion
zealot reviews deletion # rubber stamp
Correct pattern:
zealot proposes deletion
prime reviews deletion # demands safety proof
harbinger reviews deletion # surfaces cascade risks
Implementation: Track constitution per agent. Route review requests to agents with orthogonal mandates.
3. Mandate Specification Format
Constitution structure (3 sections):
## Mandate
Core directives. What agent must do. Non-negotiable.
## Principles
Operating values. How agent evaluates quality.
## Execution
Communication style. How agent delivers.
Length constraint: One screen max. Over-specification kills orthogonality—agents need room to interpret differently.
4. Adversarial Convergence Process
When decision needs validation:
- Route to ≥2 agents with incompatible mandates
- Each agent evaluates independently (no shared context beyond proposal)
- Collect objections/approvals
- Decision proceeds when ALL orthogonal reviewers approve
Space-OS implementation: Reply threads on decisions. Minimum viable: ≥2 constitutional identities must approve before commitment.
5. Observable Metrics
Track these signals:
Healthy orthogonality:
- Decision challenge rate: 20-40% of proposals get pushback
- Cross-constitution corrections: Agent A catches Agent B's blindspot
- Convergence time: Multi-round discussion before approval
Failure modes:
- Fast unanimous approval (agents aren't actually adversarial)
- Same constitution repeatedly proposing+approving (no orthogonality)
- Zero rejections (constitutions too weak or identical)
Space-OS measurement:
-- Decision challenge rate
SELECT
COUNT(CASE WHEN reply_count > 0 THEN 1 END)::float / COUNT(*)
FROM decisions;
-- Cross-constitution corrections
SELECT COUNT(*) FROM replies
WHERE reply_to IN (SELECT id FROM insights)
AND creator_constitution != insight_constitution;
Minimal Viable Implementation
Don't need space-os. Need:
- ≥2 agent identities with incompatible mandates (prompts)
- Structured review process (human routes proposals to orthogonal reviewers)
- Rejection tracking (did reviewers object? why?)
Example (any LLM API):
constitutions = {
"zealot": "You prioritize deletion and simplicity. Reject complexity.",
"prime": "You demand evidence and mechanism. Reject unvalidated claims."
}
def review_with_orthogonality(proposal):
reviews = []
for agent, mandate in constitutions.items():
prompt = f"{mandate}\n\nReview: {proposal}\nApprove or reject with reasoning."
reviews.append(llm.generate(prompt))
return all("approve" in r.lower() for r in reviews), reviews
Scale this with: more constitutions, structured decision primitives, persistent review history.
Why This Works
CAI (Constitutional AI) bakes values into model weights at training time. Opaque, static, self-referential.
Constitutional orthogonality uses runtime adversarial review. Transparent (see which constitution objected), dynamic (constitutions are prompts you can change), multi-agent not self-critique.
Difference: CAI = "make one agent safe." Orthogonality = "make agent decisions robust."
Complementary layers. CAI model can be substrate for orthogonal coordination.
Failure Modes
Shared blindspot: Orthogonal constitutions converge on same flawed assumption. No structural mitigation—detection happens when output fails, not before.
Coordination overhead: Too many reviewers → nothing ships. Start with 2-3 orthogonal pairs. Scale only when rejection rate falls below 20%.
Mandate drift: Constitutions weaken over time ("be helpful" becomes universal approval). Audit rejection rates. If trending toward zero, constitutions lost teeth.
References
- docs/constitutions.md — Space-OS constitutional implementation
- docs/philosophy.md — Why orthogonality beats consensus
- docs/thesis.md — Adversarial convergence mechanism
- brr/findings/022-orthogonal-convergence.md — Empirical evidence from space-os swarm
Falsifiability
If constitutional orthogonality works:
- Cross-agent corrections observable in logs
- Rejection rate 20-40% sustained
- Output quality (measured downstream) > single-agent baseline
If it doesn't:
- Agents converge to unanimous approval (mandates too weak)
- Coordination overhead kills velocity (too many reviewers)
- Same failure modes as single-agent (orthogonality insufficient)
Track these. If orthogonality fails, methodology documents why.
Runnable Reference Implementation
See examples/minimal-orthogonality.py for a complete, standalone implementation demonstrating:
- Two agents with orthogonal constitutions (zealot, prime)
- Adversarial review process
- SQLite ledger for review history
- Challenge rate measurement
~100 lines, zero space-os dependencies. Adapt for your multi-agent system.