feat(brain-retro): extend mandatory digital analysis 7 → 10 cuts

SKILL.md MANDATORY DIGITAL ANALYSIS block grows by three cuts:
  8. Class × canon coverage  (analyzer: buildClassCanonCoverage)
  9. Router vs Opus          (analyzer: buildRouterVsOpus,
                              sections A / B / C — A and C are
                              mutually exclusive by construction)
 10. Chain-ignore breakdown  (analyzer: buildChainIgnoreBreakdown,
                              bucketed by chain length 1 / 2 / 3+)

All three are wired into analyzer analyze() output as
result.classCanonCoverage / result.routerVsOpus /
result.chainIgnoreBreakdown and produced automatically on every
retro run (no manual step). +216 lines analyzer / +288 lines tests
covering the three functions in isolation and via analyze().

Driven by retro #8 manual analysis: the three cuts surface signal
the existing 7 cuts missed — router-vs-Opus disagreement, canon
coverage by classification, chain-vs-singleton ignore rate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Дмитрий
2026-05-27 18:08:53 +03:00
parent e184ffe212
commit b139888376
3 changed files with 516 additions and 4 deletions
+12 -4
View File
@@ -22,7 +22,8 @@ Aggregator over observer evidence. Reads JSONL + optional MD notes, surfaces can
## Procedure
> **MANDATORY DIGITAL ANALYSIS (added 2026-05-26 after retro #6 feedback).**
> Каждый прогон /brain-retro ОБЯЗАН включать **количественные срезы**, не только causal narrative. Минимум 7 цифровых таблиц:
> Каждый прогон /brain-retro ОБЯЗАН включать **количественные срезы**, не только causal narrative. Минимум 10 цифровых таблиц:
>
> 1. **Path-type breakdown** (regulated vs improvised, со счётчиками и %).
> 2. **node_chosen distribution** (топ-15 узлов с count + %).
> 3. **recommended_node distribution** (что классификатор предложил, count + %).
@@ -30,11 +31,16 @@ Aggregator over observer evidence. Reads JSONL + optional MD notes, surfaces can
> 5. **outcome × node_chosen group**: 3 группы (skill_used / direct_no_rec / direct_ignored_rec) со счётчиками + rework rate per group.
> 6. **classifier_output presence by source** (prefilter / llm / regex / cache / NULL) — даёт диагностику здоровья самого классификатора.
> 7. **Per-classification trigger-match + via-skill** (analysis / planning / bugfix / feature / refactor / security).
> 8. **Class × canon coverage** — таблица класс задач × канонические узлы из мозга (`observer-classification-map.json`) × роутер рекомендовал × я реально взял × попало ли в канон. Источник — `result.classCanonCoverage` из analyzer.
> 9. **Router vs Opus** — три секции: A (роутер дал → Opus оценил, расхождение видно сразу), B (роутер молчал → Opus сказал «надо был скил»), C (роутер дал → Opus согласился что скил излишен). Источник — `result.routerVsOpus`.
> 10. **Chain-ignore breakdown** — отдельный срез: сколько раз роутер рекомендовал цепочку vs одиночный узел, какой % я игнорировал, и rework-rate каждого; bucket по длине цепочки (1/2/3+). Источник — `result.chainIgnoreBreakdown`.
>
> Без этих 7 таблиц retro считается недоделанным. Narrative-выводы должны опираться на цифры из них, не на «общие ощущения». **Если classifier_output=NULL > 30% эпизодов** — это сигнал, что классификатор сломан; в retro отдельным блоком отчитаться о состоянии классификатора (timeouts/errors/source distribution).
> Без этих 10 таблиц retro считается недоделанным. Narrative-выводы должны опираться на цифры из них, не на «общие ощущения». **Если classifier_output=NULL > 30% эпизодов** — это сигнал, что классификатор сломан; в retro отдельным блоком отчитаться о состоянии классификатора (timeouts/errors/source distribution).
>
> Запрет на жаргон для блока «Report to user»: цифры остаются техническими, словесные выводы пользователю — простым языком (см. memory `feedback_plain_language.md`).
<!-- markdownlint-disable MD029 MD032 -->
1. **Determine period**: ask user «за какой период» or default to «since last brain-retro» (find latest `docs/observer/notes/YYYY-MM-DD-brain-retro-*.md`).
2. **Read evidence**: glob `docs/observer/episodes-YYYY-MM.jsonl` for the period; read all lines as JSON.
3. **Read optional notes**: glob `docs/observer/notes/*.md` filtered by date.
@@ -43,8 +49,8 @@ Aggregator over observer evidence. Reads JSONL + optional MD notes, surfaces can
5a. **[Phase 3] Sanity questions (spec §4.7)** — `node tools/brain-retro-sanity-generator.mjs` (called as a module from analyzer-driven flow, OR direct via `import { generateCandidateQuestions } from '../../../tools/brain-retro-sanity-generator.mjs'`) returns up to 5 candidate questions. Pick 3-4, ask via AskUserQuestion (multiple-choice + free comment). **Вопросы заказчику — простым языком**, не «rework / wrong_skill / TDD pattern / self_assessment», а «переделки / выбор не того инструмента / самопроверка» (memory `feedback_plain_language.md`). Если первый раунд содержит жаргон — переформулировать и переспросить. **Before persist:** sanitize free comments with `tools/observer-pii-filter.mjs` (`sanitize` export, RU_PHONE / EMAIL / TOKEN strip). Write answers to `docs/observer/sanity-checks/YYYY-MM-DD.json` `{schema_version: 1, questions: [...]}`.
5b. **Reviewer pass** — pragmatic two-mode policy (added 2026-05-26 after brain-retro #6, replacing original spec §4.6 «subagent only» which was unrealistic at retro scale):
- **Batch mode (default, fast)** — `node tools/brain-retro-batch-reviewer.mjs docs/observer/episodes-YYYY-MM.jsonl <cutoff-iso> [limit=30] [conc=5]`. Direct Opus API via `reviewViaDirectApi` from `tools/brain-retro-opus-reviewer.mjs` with concurrency 5. Use for **N ≥ 20 unreviewed episodes** — typical retro workload (retro #6 processed 132 episodes in 293s = ~2.2s/episode, well under per-subagent overhead).
- **Subagent mode (per spec §4.6, deeper context)** — `Task(subagent_type='reviewer-agent', prompt=<episode JSON + sanity-answers context>)`. Use for **N < 20 episodes** OR when the reviewer needs access to other tools (read related files, grep history). Per-episode try/catch — on subagent crash/timeout, fall back to `reviewViaDirectApi`.
- **Batch mode (default, fast)** — `node tools/brain-retro-batch-reviewer.mjs docs/observer/episodes-YYYY-MM.jsonl <cutoff-iso> [limit=30] [conc=5]`. Direct Opus API via `reviewViaDirectApi` from `tools/brain-retro-opus-reviewer.mjs` with concurrency 5. Use for **N ≥ 20 unreviewed episodes** — typical retro workload (retro #6 processed 132 episodes in 293s = ~2.2s/episode, well under per-subagent overhead).
- **Subagent mode (per spec §4.6, deeper context)** — `Task(subagent_type='reviewer-agent', prompt=<episode JSON + sanity-answers context>)`. Use for **N < 20 episodes** OR when the reviewer needs access to other tools (read related files, grep history). Per-episode try/catch — on subagent crash/timeout, fall back to `reviewViaDirectApi`.
Both modes write the same payload back: `review.*` + `outcome_reviewed` + `outcome_reviewed_source` (`direct_api_batch` for batch, `subagent` for Task(), `direct_api_fallback` when subagent fails). If both fail, leave `review.reviewer_error: <msg>` for the next retro.
6. **Aggregate** per `references/aggregation-template.md` — fill the Factor analysis matrix from the analyzer's `factorMatrix`, the task groups from `tasks`, the causal-chain candidates from `causalChains`, plus the new sections: sanity-check results, reviewer-agent outcomes distribution, self-retrospect trigger status.
@@ -55,6 +61,8 @@ Aggregator over observer evidence. Reads JSONL + optional MD notes, surfaces can
10. **Cost report** — read `~/.claude/runtime/cost-daily.json`; include classifier + self_assessment + reviewer cost totals for the period in the retro note.
11. **Report to user**: high-signal summary including sanity highlights, reviewer outcome distribution, and any escalations.
<!-- markdownlint-enable MD029 MD032 -->
## Output anatomy
See `references/aggregation-template.md`.
+216
View File
@@ -7,6 +7,7 @@
* Security Guidance #40: pure parsing — no exec/execSync.
*/
import { Buffer } from 'buffer';
import { resolve as pathResolve } from 'path';
import { readFileSync, existsSync } from 'fs';
import { detectMissedActivations } from './missed-activations.mjs';
import {
@@ -356,6 +357,204 @@ export function buildFactorMatrix(episodesWithOutcome) {
return matrix;
}
// ────────────────────────────────────────────────────────────────
// New cut helpers — normalize recommended id to '#N' form for canon
// comparison regardless of whether the source stored 19 or '#19'.
// ────────────────────────────────────────────────────────────────
function normalizeNodeId(id) {
if (id == null) return null;
const s = String(id).trim();
return s.startsWith('#') ? s : `#${s}`;
}
function hasRecommendation(ep) {
const pr = ep.primary_rationale || {};
const co = ep.classifier_output || {};
const recNode = pr.recommended_node || co.recommended_node;
const recChain = pr.recommended_chain || co.recommended_chain;
return !!(recNode || (Array.isArray(recChain) && recChain.length > 0));
}
function getRecommendedNode(ep) {
const pr = ep.primary_rationale || {};
const co = ep.classifier_output || {};
return pr.recommended_node || co.recommended_node || null;
}
function getRecommendedChain(ep) {
const pr = ep.primary_rationale || {};
const co = ep.classifier_output || {};
const chain = pr.recommended_chain || co.recommended_chain;
return Array.isArray(chain) ? chain : [];
}
/**
* Cut 8 — Class × canon coverage.
* Returns one row per task_classification appearing in the episodes, sorted by count desc.
* classificationMap shape: { [classification]: string[] } — canonical node IDs (e.g. '#34').
*/
export function buildClassCanonCoverage(episodes, classificationMap) {
const map = classificationMap || {};
const byClass = new Map();
for (const ep of episodes) {
const classification = (ep.primary_rationale || {}).task_classification || 'other';
if (!byClass.has(classification)) {
byClass.set(classification, {
classification,
count: 0,
canonicalNodes: map[classification] ? [...map[classification]] : [],
routerRecommended: 0,
claudeTook: 0,
recWithinCanon: 0,
rework: 0,
});
}
const row = byClass.get(classification);
row.count += 1;
const recNode = getRecommendedNode(ep);
const recChain = getRecommendedChain(ep);
const hasRec = !!(recNode || recChain.length > 0);
if (hasRec) {
row.routerRecommended += 1;
// Check if any recommended id falls within canonical set
const canonSet = new Set(row.canonicalNodes.map(normalizeNodeId));
const allRecIds = [];
if (recNode) allRecIds.push(normalizeNodeId(recNode));
for (const id of recChain) allRecIds.push(normalizeNodeId(id));
if (allRecIds.some((id) => id && canonSet.has(id))) {
row.recWithinCanon += 1;
}
}
const nodeChosen = (ep.primary_rationale || {}).node_chosen;
if (nodeChosen && nodeChosen !== 'direct') {
row.claudeTook += 1;
}
if (ep.outcome_reviewed === 'rework') {
row.rework += 1;
}
}
return [...byClass.values()].sort((a, b) => b.count - a.count);
}
/**
* Cut 9 — Router vs Opus three-section breakdown.
* Returns { sectionA, sectionB, sectionC } — each an array of structured items.
* Episodes lacking `review` are excluded from all sections.
*/
export function buildRouterVsOpus(episodes) {
const sectionA = [];
const sectionB = [];
const sectionC = [];
for (const ep of episodes) {
const rev = ep.review;
if (!rev || typeof rev !== 'object' || rev.reviewer_error) continue;
const pr = ep.primary_rationale || {};
const hasRec = hasRecommendation(ep);
const recNode = getRecommendedNode(ep);
const recChain = getRecommendedChain(ep);
const routerRecommendation = recChain.length > 0 ? recChain : recNode;
const time = (ep.timestamps || {}).started_at || null;
const taskId = String(ep.task_id || '').slice(0, 8);
const classification = pr.task_classification || 'other';
const nodeChosen = pr.node_chosen || 'direct';
const outcomeReviewed = ep.outcome_reviewed || 'unknown';
if (hasRec) {
const isCorrectNoAlt = rev.node_quality === 'correct' && !rev.alternative_better;
if (isCorrectNoAlt) {
// Section C: router gave + Opus agreed it was fine (correct, no better alternative)
sectionC.push({ time, taskId, classification, routerRecommendation, outcomeReviewed });
} else {
// Section A: router gave + some disagreement or uncertainty (wrong_node / disputable / has alternative)
sectionA.push({
time,
taskId,
classification,
routerRecommendation,
claudeChose: nodeChosen,
opusNodeQuality: rev.node_quality || 'n/a',
opusChainQuality: rev.chain_quality || 'n/a',
outcomeReviewed,
opusAlternative: rev.alternative_better || null,
opusRootCause: rev.error_root_cause || 'n/a',
});
}
} else if (!hasRec && rev.alternative_better) {
// Section B: router silent, Opus identified a better node
sectionB.push({
time,
taskId,
classification,
opusSuggests: rev.alternative_better,
outcomeReviewed,
opusReasoning: String(rev.reasoning || '').slice(0, 200),
});
}
}
return { sectionA, sectionB, sectionC };
}
/**
* Cut 10 — Chain-ignore breakdown.
* Distinguishes chain recommendations from node-only recommendations and reports
* ignore rates + rework rates, bucketed by chain length.
*/
export function buildChainIgnoreBreakdown(episodes) {
const result = {
totalChainRecommendations: 0,
ignoredChainCount: 0,
ignoredChainRework: 0,
totalNodeOnlyRecommendations: 0,
ignoredNodeOnlyCount: 0,
ignoredNodeOnlyRework: 0,
breakdownByChainLength: {
'1': { count: 0, ignored: 0, rework: 0 },
'2': { count: 0, ignored: 0, rework: 0 },
'3+': { count: 0, ignored: 0, rework: 0 },
},
};
for (const ep of episodes) {
const pr = ep.primary_rationale || {};
const recNode = getRecommendedNode(ep);
const recChain = getRecommendedChain(ep);
const hasChain = recChain.length > 0;
const hasNodeOnly = !hasChain && !!recNode;
const nodeChosen = pr.node_chosen || 'direct';
const isIgnored = nodeChosen === 'direct';
const isRework = ep.outcome_reviewed === 'rework';
if (hasChain) {
result.totalChainRecommendations += 1;
const lenBucket = recChain.length === 1 ? '1' : recChain.length === 2 ? '2' : '3+';
result.breakdownByChainLength[lenBucket].count += 1;
if (isIgnored) {
result.ignoredChainCount += 1;
result.breakdownByChainLength[lenBucket].ignored += 1;
if (isRework) {
result.ignoredChainRework += 1;
result.breakdownByChainLength[lenBucket].rework += 1;
}
}
} else if (hasNodeOnly) {
result.totalNodeOnlyRecommendations += 1;
if (isIgnored) {
result.ignoredNodeOnlyCount += 1;
if (isRework) result.ignoredNodeOnlyRework += 1;
}
}
}
return result;
}
/** Full deterministic aggregation: dedup → infer outcomes → group → chains → matrix → missed activations. */
export function analyze(episodes, options = {}) {
const deduped = dedupeEpisodes(episodes);
@@ -441,6 +640,20 @@ export function analyze(episodes, options = {}) {
}
}
// Cuts 8/9/10 — read classificationMap from the archived file when not
// passed via options (CLI invocation). Silent fallback to {} on missing/broken file.
let canonMapForCuts = classificationMap;
if (!Object.keys(canonMapForCuts).length) {
try {
const mapPath = pathResolve('docs/archive/llm-bootstrap-2026-05/routing-docs/observer-classification-map.json');
const raw = readFileSync(mapPath, 'utf-8');
const parsed = JSON.parse(raw);
canonMapForCuts = parsed.map || {};
} catch {
canonMapForCuts = {};
}
}
return {
episodeCount: normal.length,
v1SkippedCount,
@@ -457,6 +670,9 @@ export function analyze(episodes, options = {}) {
reviewerCoverage,
degradedCount,
costTotals,
classCanonCoverage: buildClassCanonCoverage(normal, canonMapForCuts),
routerVsOpus: buildRouterVsOpus(normal),
chainIgnoreBreakdown: buildChainIgnoreBreakdown(normal),
};
}
+288
View File
@@ -6,6 +6,9 @@ import {
findCausalChains,
buildFactorMatrix,
analyze,
buildClassCanonCoverage,
buildRouterVsOpus,
buildChainIgnoreBreakdown,
} from './brain-retro-analyzer.mjs';
// Minimal v2 episode for tests.
@@ -717,3 +720,288 @@ describe('analyze — Pass 4 similar_past_outcome_majority axis (project-brain-f
expect(result.factorMatrix.similar_past_outcome_majority.no_neighbors).toBeDefined();
});
});
// ────────────────────────────────────────────────────────────────
// NEW CUTS: buildClassCanonCoverage, buildRouterVsOpus, buildChainIgnoreBreakdown
// ────────────────────────────────────────────────────────────────
// Shared classMap fixture (embedded — no external file dependency)
const testClassMap = {
monitoring: ['#34', '#35'],
bugfix: ['#18', '#34'],
feature: ['#19'],
release: ['#37'],
planning: ['#19', '#41', '#42'],
other: [],
};
// Helper: episode for the new cuts (minimal — no embeddings needed)
const epC = (overrides = {}) => ({
schema_version: 2,
task_id: 's1',
timestamps: { started_at: '2026-05-19T10:00:00Z', ended_at: '2026-05-19T10:05:00Z' },
primary_rationale: {
node_chosen: 'direct',
task_classification: 'other',
recommended_node: null,
recommended_chain: null,
},
outcome_reviewed: 'unknown',
...overrides,
});
describe('buildClassCanonCoverage', () => {
it('returns [] for empty input', () => {
expect(buildClassCanonCoverage([], testClassMap)).toEqual([]);
});
it('single monitoring episode with recommended_node=#34, node_chosen=direct, rework', () => {
const eps = [epC({
primary_rationale: { node_chosen: 'direct', task_classification: 'monitoring', recommended_node: '#34', recommended_chain: null },
outcome_reviewed: 'rework',
})];
const rows = buildClassCanonCoverage(eps, testClassMap);
expect(rows).toHaveLength(1);
const row = rows[0];
expect(row.classification).toBe('monitoring');
expect(row.count).toBe(1);
expect(row.canonicalNodes).toEqual(['#34', '#35']);
expect(row.routerRecommended).toBe(1); // has recommended_node
expect(row.claudeTook).toBe(0); // node_chosen === 'direct'
expect(row.recWithinCanon).toBe(1); // '#34' is in canonical
expect(row.rework).toBe(1);
});
it('classification not in map gets canonicalNodes=[]', () => {
const eps = [epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: null }, outcome_reviewed: 'success' })];
const rows = buildClassCanonCoverage(eps, {});
expect(rows[0].canonicalNodes).toEqual([]);
});
it('recommended_chain with numeric ids normalized to #N for canon check', () => {
const eps = [epC({
primary_rationale: { node_chosen: 'direct', task_classification: 'monitoring', recommended_node: null, recommended_chain: [19, 34] },
outcome_reviewed: 'success',
})];
const rows = buildClassCanonCoverage(eps, testClassMap);
// chain [19,34] → normalized ['#19','#34']. '#34' is in monitoring canonical → recWithinCanon=1
expect(rows[0].routerRecommended).toBe(1);
expect(rows[0].recWithinCanon).toBe(1);
});
it('mixed: 3 release episodes sorted desc, counting correctly', () => {
// 3 release, 2 feature (release > feature by count)
const eps = [
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'release', recommended_node: '#37', recommended_chain: null }, outcome_reviewed: 'rework', timestamps: { started_at: '2026-05-19T10:00:00Z' } }),
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'release', recommended_node: '#99', recommended_chain: null }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:01:00Z' } }),
epC({ primary_rationale: { node_chosen: '#37', task_classification: 'release', recommended_node: '#37', recommended_chain: null }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:02:00Z' } }),
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'feature', recommended_node: '#19', recommended_chain: null }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:03:00Z' } }),
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'feature', recommended_node: null, recommended_chain: null }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:04:00Z' } }),
];
const rows = buildClassCanonCoverage(eps, testClassMap);
// Sorted by count desc: release=3, feature=2
expect(rows[0].classification).toBe('release');
expect(rows[0].count).toBe(3);
expect(rows[0].routerRecommended).toBe(3); // all 3 have recommended_node
expect(rows[0].claudeTook).toBe(1); // one has node_chosen='#37'
expect(rows[0].recWithinCanon).toBe(2); // '#37' in release canonical for ep1 and ep3; '#99' not in canonical for ep2
expect(rows[0].rework).toBe(1);
expect(rows[1].classification).toBe('feature');
expect(rows[1].count).toBe(2);
expect(rows[1].routerRecommended).toBe(1); // only 1 has recommended_node
expect(rows[1].claudeTook).toBe(0);
});
});
describe('buildRouterVsOpus', () => {
const epR = (overrides = {}) => ({
schema_version: 4,
task_id: 'session-abc-12345',
timestamps: { started_at: '2026-05-19T10:00:00Z' },
primary_rationale: {
node_chosen: 'direct',
task_classification: 'other',
recommended_node: null,
recommended_chain: null,
},
outcome_reviewed: 'unknown',
review: {
node_quality: 'correct',
chain_quality: 'n/a',
alternative_better: null,
error_root_cause: 'n/a',
reasoning: 'ok',
},
...overrides,
});
it('one episode in each of A/B/C → 1/1/1', () => {
const eps = [
// A: router gave recommendation, has review
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'feature', recommended_node: '#19', recommended_chain: null },
review: { node_quality: 'wrong_node', chain_quality: 'n/a', alternative_better: '#37', error_root_cause: 'wrong_skill', reasoning: 'x' }, outcome_reviewed: 'rework' }),
// B: router silent, alternative_better set
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'planning', recommended_node: null, recommended_chain: null },
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: '#41', error_root_cause: 'n/a', reasoning: 'should have used planning' }, outcome_reviewed: 'soft_success',
timestamps: { started_at: '2026-05-19T10:01:00Z' } }),
// C: router gave, node_quality=correct, no alternative
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'release', recommended_node: '#37', recommended_chain: null },
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: null, error_root_cause: 'n/a', reasoning: 'direct was fine' }, outcome_reviewed: 'success',
timestamps: { started_at: '2026-05-19T10:02:00Z' } }),
];
const result = buildRouterVsOpus(eps);
expect(result.sectionA).toHaveLength(1);
expect(result.sectionB).toHaveLength(1);
expect(result.sectionC).toHaveLength(1);
});
it('episode without review is excluded from all three sections', () => {
const eps = [
epR({ review: undefined, primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: '#19', recommended_chain: null } }),
];
const result = buildRouterVsOpus(eps);
expect(result.sectionA).toHaveLength(0);
expect(result.sectionB).toHaveLength(0);
expect(result.sectionC).toHaveLength(0);
});
it('A: episode with recommended_chain array of strings goes into A with routerRecommendation = the array', () => {
const eps = [
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'planning', recommended_node: null, recommended_chain: ['#19', '#41'] },
review: { node_quality: 'wrong_node', chain_quality: 'missing_step', alternative_better: '#19', error_root_cause: 'wrong_chain_order', reasoning: 'chain needed' }, outcome_reviewed: 'rework' }),
];
const result = buildRouterVsOpus(eps);
expect(result.sectionA).toHaveLength(1);
expect(Array.isArray(result.sectionA[0].routerRecommendation)).toBe(true);
expect(result.sectionA[0].routerRecommendation).toEqual(['#19', '#41']);
});
it('B: router silent AND alternative_better truthy → in B; router silent AND alternative_better=null → not in B', () => {
const eps = [
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: null },
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: '#60', error_root_cause: 'n/a', reasoning: 'should use docs' }, outcome_reviewed: 'soft_success' }),
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: null },
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: null, error_root_cause: 'n/a', reasoning: 'fine' }, outcome_reviewed: 'success',
timestamps: { started_at: '2026-05-19T10:01:00Z' } }),
];
const result = buildRouterVsOpus(eps);
expect(result.sectionB).toHaveLength(1);
expect(result.sectionB[0].opusSuggests).toBe('#60');
});
it('C: router gave + node_quality=correct + no alternative → in C; same but alternative_better truthy → NOT in C', () => {
const inC = epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'release', recommended_node: '#37', recommended_chain: null },
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: null, error_root_cause: 'n/a', reasoning: 'fine' }, outcome_reviewed: 'success' });
const notInC = epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'release', recommended_node: '#37', recommended_chain: null },
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: '#41', error_root_cause: 'n/a', reasoning: 'actually #41 better' }, outcome_reviewed: 'rework',
timestamps: { started_at: '2026-05-19T10:01:00Z' } });
const result = buildRouterVsOpus([inC, notInC]);
expect(result.sectionC).toHaveLength(1);
// The one NOT in C (has alternative_better) should be in A instead
expect(result.sectionA).toHaveLength(1);
});
it('sectionA item has all expected shape fields', () => {
const eps = [
// Must be wrong_node or have alternative to end up in A (not C)
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'feature', recommended_node: '#19', recommended_chain: null },
review: { node_quality: 'wrong_node', chain_quality: 'n/a', alternative_better: '#37', error_root_cause: 'wrong_skill', reasoning: 'should be #37' }, outcome_reviewed: 'rework' }),
];
const result = buildRouterVsOpus(eps);
const item = result.sectionA[0];
expect(item).toHaveProperty('time');
expect(item).toHaveProperty('taskId');
expect(item).toHaveProperty('classification');
expect(item).toHaveProperty('routerRecommendation');
expect(item).toHaveProperty('claudeChose');
expect(item).toHaveProperty('opusNodeQuality');
expect(item).toHaveProperty('opusChainQuality');
expect(item).toHaveProperty('outcomeReviewed');
expect(item).toHaveProperty('opusAlternative');
expect(item).toHaveProperty('opusRootCause');
expect(item.taskId).toHaveLength(8); // first 8 chars of task_id
});
});
describe('buildChainIgnoreBreakdown', () => {
it('returns all zeros for empty input', () => {
const result = buildChainIgnoreBreakdown([]);
expect(result.totalChainRecommendations).toBe(0);
expect(result.ignoredChainCount).toBe(0);
expect(result.ignoredChainRework).toBe(0);
expect(result.totalNodeOnlyRecommendations).toBe(0);
expect(result.ignoredNodeOnlyCount).toBe(0);
expect(result.ignoredNodeOnlyRework).toBe(0);
expect(result.breakdownByChainLength['1']).toEqual({ count: 0, ignored: 0, rework: 0 });
expect(result.breakdownByChainLength['2']).toEqual({ count: 0, ignored: 0, rework: 0 });
expect(result.breakdownByChainLength['3+']).toEqual({ count: 0, ignored: 0, rework: 0 });
});
it('chain-len-4 ep with node_chosen=direct and outcome=rework → ignoredChainCount=1, rework=1, 3+ bucket', () => {
const eps = [epC({
primary_rationale: { node_chosen: 'direct', task_classification: 'planning', recommended_node: null, recommended_chain: ['#19','#41','#42','#37'] },
outcome_reviewed: 'rework',
})];
const result = buildChainIgnoreBreakdown(eps);
expect(result.totalChainRecommendations).toBe(1);
expect(result.ignoredChainCount).toBe(1);
expect(result.ignoredChainRework).toBe(1);
expect(result.breakdownByChainLength['3+']).toEqual({ count: 1, ignored: 1, rework: 1 });
});
it('node-only rec ep with node_chosen=direct → ignoredNodeOnlyCount=1', () => {
const eps = [epC({
primary_rationale: { node_chosen: 'direct', task_classification: 'monitoring', recommended_node: '#34', recommended_chain: null },
outcome_reviewed: 'success',
})];
const result = buildChainIgnoreBreakdown(eps);
expect(result.totalNodeOnlyRecommendations).toBe(1);
expect(result.ignoredNodeOnlyCount).toBe(1);
expect(result.ignoredNodeOnlyRework).toBe(0);
expect(result.totalChainRecommendations).toBe(0);
});
it('chains of length 1, 2, 5 bucketed correctly into 1/2/3+', () => {
const eps = [
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: ['#19'] }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:00:00Z' } }),
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: ['#19','#34'] }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:01:00Z' } }),
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: ['#19','#34','#37','#41','#42'] }, outcome_reviewed: 'rework', timestamps: { started_at: '2026-05-19T10:02:00Z' } }),
];
const result = buildChainIgnoreBreakdown(eps);
expect(result.totalChainRecommendations).toBe(3);
expect(result.breakdownByChainLength['1']).toEqual({ count: 1, ignored: 1, rework: 0 });
expect(result.breakdownByChainLength['2']).toEqual({ count: 1, ignored: 1, rework: 0 });
expect(result.breakdownByChainLength['3+']).toEqual({ count: 1, ignored: 1, rework: 1 });
});
it('chain-rec ep where node_chosen != direct → in totalChainRecommendations but NOT in ignoredChainCount', () => {
const eps = [epC({
primary_rationale: { node_chosen: '#19', task_classification: 'feature', recommended_node: null, recommended_chain: ['#19', '#34'] },
outcome_reviewed: 'success',
})];
const result = buildChainIgnoreBreakdown(eps);
expect(result.totalChainRecommendations).toBe(1);
expect(result.ignoredChainCount).toBe(0);
expect(result.breakdownByChainLength['2']).toEqual({ count: 1, ignored: 0, rework: 0 });
});
});
describe('analyze — classCanonCoverage / routerVsOpus / chainIgnoreBreakdown integrated', () => {
it('analyze() result includes classCanonCoverage, routerVsOpus, chainIgnoreBreakdown keys', () => {
const eps = [
ep({ schema_version: 4,
primary_rationale: { node_chosen: 'direct', task_classification: 'feature', recommended_node: '#19', recommended_chain: null, triggers_matched: [], boundaries_applied: [], step: 1, candidates_considered: [], hard_floor: { invoked: false, rules: [] } },
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: null, error_root_cause: 'n/a', reasoning: 'ok' },
outcome_reviewed: 'success' }),
];
const result = analyze(eps);
expect(result.classCanonCoverage).toBeDefined();
expect(result.routerVsOpus).toBeDefined();
expect(result.chainIgnoreBreakdown).toBeDefined();
expect(Array.isArray(result.classCanonCoverage)).toBe(true);
expect(result.routerVsOpus).toHaveProperty('sectionA');
expect(result.routerVsOpus).toHaveProperty('sectionB');
expect(result.routerVsOpus).toHaveProperty('sectionC');
expect(result.chainIgnoreBreakdown).toHaveProperty('totalChainRecommendations');
});
});