Knowledge Decay Mitigation Experiment
Status
Premise falsified before execution (see i/3da71d4a)
Baseline measurement (Feb 5) revealed citation rate isn't decay problem. All age buckets have <2% citation: 0-7d=1.2%, 7-14d=0.3%, 14d+=0%. Total citation rate 0.7%.
Decisions cite insights 0%. Surfacing old insights won't fix agents who ignore fresh insights. Root cause: citation poverty, not decay.
Hypothesis
f/031 claims knowledge decay is "constraint not bug" but doesn't empirically test proposed mitigations. We test mitigation #4: "Decay-aware surfacing: Expand min_refs=0 for old high-value insights."
Null hypothesis: Reducing min_refs threshold for foundational insights has no effect on reference rate or spawn decision quality.
Alternative hypothesis: Surfacing older insights (even with refs=0-2) increases citation rate and reduces rediscovery.
Experimental Design
Baseline (7 days, current system)
- Track: insight reference rate by age bucket (0-7d, 7-14d, 14-21d, 21d+)
- Track: decision quality (challenge rate, reversal rate, half-life)
- Track: rediscovery incidents (duplicate insight detection)
- Measurement window: Feb 6-13, 2026
Intervention (7 days, modified system)
- Change:
insights.fetch_foundational()min_refs default 3 → 1 - Change: Context assembly surfaces foundational insights with refs≥1 if age>7d
- Implementation: space/os/insights.py:446, space/ctx/prompt.py (foundational section)
- Track: same metrics as baseline
- Measurement window: Feb 14-21, 2026
Comparison
- Primary metric: reference rate for insights age 7-14d (currently 0%, f/031:9)
- Secondary: rediscovery incidents (spawn derives conclusion already in ledger)
- Tertiary: decision half-life (does surfacing old context slow decisions?)
Success Criteria
Intervention succeeds if:
- Reference rate for 7-14d insights increases from 0% to >5%
- Rediscovery incidents decline (no explicit baseline, track during experiment)
- Decision half-life does not increase >10% (coordination overhead check)
Intervention fails if:
- Reference rate remains 0%
- OR decision half-life increases >10% (mitigation adds more cost than value)
Falsifiability
If decay is truly architectural constraint (not fixable via surfacing):
- Reference rate won't improve even when old insights are surfaced
- Agents will ignore surfaced context in favor of deriving conclusions fresh
- This proves f/031's "constraint not bug" claim
If decay is surfacing problem (fixable):
- Reference rate improves when insights are surfaced
- Rediscovery declines
- This contradicts f/031, suggests decay is solvable
Implementation Notes
Reversible changes:
- insights.py:446 default parameter
- prompt.py foundational section (add age-based min_refs adjustment)
Non-reversible: experiment itself (can't un-run it). But data collection is observational, no external commitment.
Open Questions
- What defines "high-value" for old insights? Domain=#governance higher priority than #status?
- Should intervention test single mitigation or multiple? (Timeline compression + surfacing)
- Does 7-day window suffice for statistical significance?
References
- [f/031] Decay Horizon Constraint (proposes mitigations, doesn't test)
- [f/026] Governance Benchmark Design (knowledge decay metric definition)
- arxiv paper line 121: "intervention studies to improve weak metrics" (future work)
Next Steps
- Collect baseline metrics (Feb 6-13)
- Implement intervention
- Collect intervention metrics (Feb 14-21)
- Analyze, document findings
- Decide: keep intervention, revert, or iterate