Live Eval Roadmap
This roadmap is the practical build plan for adding live eval to the framework.
It is written for people deciding what to build next, not only for framework maintainers.
Phase 1: Shadow Mode
Goal:
- measure usefulness without changing enforcement
Build:
- eval record capture
- eval draft capture
- team profile capture
- simple outcome summaries
- shipped CLI path from report artifact to eval artifact
- optional command-level telemetry export only after artifact semantics are stable
Questions answered:
- which rules create the most waivers?
- which evidence fields help reviewers most?
- where is the framework still missing coverage?
Phase 2: Assist Mode
Goal:
- tune the framework using the collected data
Build:
- rule-noise summaries
- guidance about what to promote, soften, or clarify
- clearer reviewer-facing evidence summaries
Questions answered:
- which warnings should stay warnings?
- which policies are reliable enough to harden?
- which rules should be split or rewritten?
Phase 3: Gate Mode
Goal:
- let proven policy become stronger enforcement
Build:
- promotion criteria based on repeated eval outcomes
- team-profile controls for how strict CI should be
- explicit signoff requirements for moving rules to harder stages
Questions answered:
- which rules have earned block-level trust?
- which proof lanes should be mandatory before promotion?
- when should warnings block in CI for this team?
What Comes Last
These can wait until the framework has real live-eval history:
- dashboards
- hosted analytics
- model fine-tuning loops
- automatic rule promotion
If telemetry is added before dashboards, it should stay downstream from canonical artifacts and use command-level spans or events rather than a synthetic end-to-end lifecycle trace.
The early goal is not complexity. The early goal is to learn what guidance actually works.