feat(brain-retro): extend mandatory digital analysis 7 → 10 cuts
SKILL.md MANDATORY DIGITAL ANALYSIS block grows by three cuts:
8. Class × canon coverage (analyzer: buildClassCanonCoverage)
9. Router vs Opus (analyzer: buildRouterVsOpus,
sections A / B / C — A and C are
mutually exclusive by construction)
10. Chain-ignore breakdown (analyzer: buildChainIgnoreBreakdown,
bucketed by chain length 1 / 2 / 3+)
All three are wired into analyzer analyze() output as
result.classCanonCoverage / result.routerVsOpus /
result.chainIgnoreBreakdown and produced automatically on every
retro run (no manual step). +216 lines analyzer / +288 lines tests
covering the three functions in isolation and via analyze().
Driven by retro #8 manual analysis: the three cuts surface signal
the existing 7 cuts missed — router-vs-Opus disagreement, canon
coverage by classification, chain-vs-singleton ignore rate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -22,7 +22,8 @@ Aggregator over observer evidence. Reads JSONL + optional MD notes, surfaces can
|
||||
## Procedure
|
||||
|
||||
> **MANDATORY DIGITAL ANALYSIS (added 2026-05-26 after retro #6 feedback).**
|
||||
> Каждый прогон /brain-retro ОБЯЗАН включать **количественные срезы**, не только causal narrative. Минимум 7 цифровых таблиц:
|
||||
> Каждый прогон /brain-retro ОБЯЗАН включать **количественные срезы**, не только causal narrative. Минимум 10 цифровых таблиц:
|
||||
>
|
||||
> 1. **Path-type breakdown** (regulated vs improvised, со счётчиками и %).
|
||||
> 2. **node_chosen distribution** (топ-15 узлов с count + %).
|
||||
> 3. **recommended_node distribution** (что классификатор предложил, count + %).
|
||||
@@ -30,11 +31,16 @@ Aggregator over observer evidence. Reads JSONL + optional MD notes, surfaces can
|
||||
> 5. **outcome × node_chosen group**: 3 группы (skill_used / direct_no_rec / direct_ignored_rec) со счётчиками + rework rate per group.
|
||||
> 6. **classifier_output presence by source** (prefilter / llm / regex / cache / NULL) — даёт диагностику здоровья самого классификатора.
|
||||
> 7. **Per-classification trigger-match + via-skill** (analysis / planning / bugfix / feature / refactor / security).
|
||||
> 8. **Class × canon coverage** — таблица класс задач × канонические узлы из мозга (`observer-classification-map.json`) × роутер рекомендовал × я реально взял × попало ли в канон. Источник — `result.classCanonCoverage` из analyzer.
|
||||
> 9. **Router vs Opus** — три секции: A (роутер дал → Opus оценил, расхождение видно сразу), B (роутер молчал → Opus сказал «надо был скил»), C (роутер дал → Opus согласился что скил излишен). Источник — `result.routerVsOpus`.
|
||||
> 10. **Chain-ignore breakdown** — отдельный срез: сколько раз роутер рекомендовал цепочку vs одиночный узел, какой % я игнорировал, и rework-rate каждого; bucket по длине цепочки (1/2/3+). Источник — `result.chainIgnoreBreakdown`.
|
||||
>
|
||||
> Без этих 7 таблиц retro считается недоделанным. Narrative-выводы должны опираться на цифры из них, не на «общие ощущения». **Если classifier_output=NULL > 30% эпизодов** — это сигнал, что классификатор сломан; в retro отдельным блоком отчитаться о состоянии классификатора (timeouts/errors/source distribution).
|
||||
> Без этих 10 таблиц retro считается недоделанным. Narrative-выводы должны опираться на цифры из них, не на «общие ощущения». **Если classifier_output=NULL > 30% эпизодов** — это сигнал, что классификатор сломан; в retro отдельным блоком отчитаться о состоянии классификатора (timeouts/errors/source distribution).
|
||||
>
|
||||
> Запрет на жаргон для блока «Report to user»: цифры остаются техническими, словесные выводы пользователю — простым языком (см. memory `feedback_plain_language.md`).
|
||||
|
||||
<!-- markdownlint-disable MD029 MD032 -->
|
||||
|
||||
1. **Determine period**: ask user «за какой период» or default to «since last brain-retro» (find latest `docs/observer/notes/YYYY-MM-DD-brain-retro-*.md`).
|
||||
2. **Read evidence**: glob `docs/observer/episodes-YYYY-MM.jsonl` for the period; read all lines as JSON.
|
||||
3. **Read optional notes**: glob `docs/observer/notes/*.md` filtered by date.
|
||||
@@ -43,8 +49,8 @@ Aggregator over observer evidence. Reads JSONL + optional MD notes, surfaces can
|
||||
5a. **[Phase 3] Sanity questions (spec §4.7)** — `node tools/brain-retro-sanity-generator.mjs` (called as a module from analyzer-driven flow, OR direct via `import { generateCandidateQuestions } from '../../../tools/brain-retro-sanity-generator.mjs'`) returns up to 5 candidate questions. Pick 3-4, ask via AskUserQuestion (multiple-choice + free comment). **Вопросы заказчику — простым языком**, не «rework / wrong_skill / TDD pattern / self_assessment», а «переделки / выбор не того инструмента / самопроверка» (memory `feedback_plain_language.md`). Если первый раунд содержит жаргон — переформулировать и переспросить. **Before persist:** sanitize free comments with `tools/observer-pii-filter.mjs` (`sanitize` export, RU_PHONE / EMAIL / TOKEN strip). Write answers to `docs/observer/sanity-checks/YYYY-MM-DD.json` `{schema_version: 1, questions: [...]}`.
|
||||
5b. **Reviewer pass** — pragmatic two-mode policy (added 2026-05-26 after brain-retro #6, replacing original spec §4.6 «subagent only» which was unrealistic at retro scale):
|
||||
|
||||
- **Batch mode (default, fast)** — `node tools/brain-retro-batch-reviewer.mjs docs/observer/episodes-YYYY-MM.jsonl <cutoff-iso> [limit=30] [conc=5]`. Direct Opus API via `reviewViaDirectApi` from `tools/brain-retro-opus-reviewer.mjs` with concurrency 5. Use for **N ≥ 20 unreviewed episodes** — typical retro workload (retro #6 processed 132 episodes in 293s = ~2.2s/episode, well under per-subagent overhead).
|
||||
- **Subagent mode (per spec §4.6, deeper context)** — `Task(subagent_type='reviewer-agent', prompt=<episode JSON + sanity-answers context>)`. Use for **N < 20 episodes** OR when the reviewer needs access to other tools (read related files, grep history). Per-episode try/catch — on subagent crash/timeout, fall back to `reviewViaDirectApi`.
|
||||
- **Batch mode (default, fast)** — `node tools/brain-retro-batch-reviewer.mjs docs/observer/episodes-YYYY-MM.jsonl <cutoff-iso> [limit=30] [conc=5]`. Direct Opus API via `reviewViaDirectApi` from `tools/brain-retro-opus-reviewer.mjs` with concurrency 5. Use for **N ≥ 20 unreviewed episodes** — typical retro workload (retro #6 processed 132 episodes in 293s = ~2.2s/episode, well under per-subagent overhead).
|
||||
- **Subagent mode (per spec §4.6, deeper context)** — `Task(subagent_type='reviewer-agent', prompt=<episode JSON + sanity-answers context>)`. Use for **N < 20 episodes** OR when the reviewer needs access to other tools (read related files, grep history). Per-episode try/catch — on subagent crash/timeout, fall back to `reviewViaDirectApi`.
|
||||
|
||||
Both modes write the same payload back: `review.*` + `outcome_reviewed` + `outcome_reviewed_source` (`direct_api_batch` for batch, `subagent` for Task(), `direct_api_fallback` when subagent fails). If both fail, leave `review.reviewer_error: <msg>` for the next retro.
|
||||
6. **Aggregate** per `references/aggregation-template.md` — fill the Factor analysis matrix from the analyzer's `factorMatrix`, the task groups from `tasks`, the causal-chain candidates from `causalChains`, plus the new sections: sanity-check results, reviewer-agent outcomes distribution, self-retrospect trigger status.
|
||||
@@ -55,6 +61,8 @@ Aggregator over observer evidence. Reads JSONL + optional MD notes, surfaces can
|
||||
10. **Cost report** — read `~/.claude/runtime/cost-daily.json`; include classifier + self_assessment + reviewer cost totals for the period in the retro note.
|
||||
11. **Report to user**: high-signal summary including sanity highlights, reviewer outcome distribution, and any escalations.
|
||||
|
||||
<!-- markdownlint-enable MD029 MD032 -->
|
||||
|
||||
## Output anatomy
|
||||
|
||||
See `references/aggregation-template.md`.
|
||||
|
||||
@@ -7,6 +7,7 @@
|
||||
* Security Guidance #40: pure parsing — no exec/execSync.
|
||||
*/
|
||||
import { Buffer } from 'buffer';
|
||||
import { resolve as pathResolve } from 'path';
|
||||
import { readFileSync, existsSync } from 'fs';
|
||||
import { detectMissedActivations } from './missed-activations.mjs';
|
||||
import {
|
||||
@@ -356,6 +357,204 @@ export function buildFactorMatrix(episodesWithOutcome) {
|
||||
return matrix;
|
||||
}
|
||||
|
||||
|
||||
// ────────────────────────────────────────────────────────────────
|
||||
// New cut helpers — normalize recommended id to '#N' form for canon
|
||||
// comparison regardless of whether the source stored 19 or '#19'.
|
||||
// ────────────────────────────────────────────────────────────────
|
||||
function normalizeNodeId(id) {
|
||||
if (id == null) return null;
|
||||
const s = String(id).trim();
|
||||
return s.startsWith('#') ? s : `#${s}`;
|
||||
}
|
||||
|
||||
function hasRecommendation(ep) {
|
||||
const pr = ep.primary_rationale || {};
|
||||
const co = ep.classifier_output || {};
|
||||
const recNode = pr.recommended_node || co.recommended_node;
|
||||
const recChain = pr.recommended_chain || co.recommended_chain;
|
||||
return !!(recNode || (Array.isArray(recChain) && recChain.length > 0));
|
||||
}
|
||||
|
||||
function getRecommendedNode(ep) {
|
||||
const pr = ep.primary_rationale || {};
|
||||
const co = ep.classifier_output || {};
|
||||
return pr.recommended_node || co.recommended_node || null;
|
||||
}
|
||||
|
||||
function getRecommendedChain(ep) {
|
||||
const pr = ep.primary_rationale || {};
|
||||
const co = ep.classifier_output || {};
|
||||
const chain = pr.recommended_chain || co.recommended_chain;
|
||||
return Array.isArray(chain) ? chain : [];
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Cut 8 — Class × canon coverage.
|
||||
* Returns one row per task_classification appearing in the episodes, sorted by count desc.
|
||||
* classificationMap shape: { [classification]: string[] } — canonical node IDs (e.g. '#34').
|
||||
*/
|
||||
export function buildClassCanonCoverage(episodes, classificationMap) {
|
||||
const map = classificationMap || {};
|
||||
const byClass = new Map();
|
||||
for (const ep of episodes) {
|
||||
const classification = (ep.primary_rationale || {}).task_classification || 'other';
|
||||
if (!byClass.has(classification)) {
|
||||
byClass.set(classification, {
|
||||
classification,
|
||||
count: 0,
|
||||
canonicalNodes: map[classification] ? [...map[classification]] : [],
|
||||
routerRecommended: 0,
|
||||
claudeTook: 0,
|
||||
recWithinCanon: 0,
|
||||
rework: 0,
|
||||
});
|
||||
}
|
||||
const row = byClass.get(classification);
|
||||
row.count += 1;
|
||||
|
||||
const recNode = getRecommendedNode(ep);
|
||||
const recChain = getRecommendedChain(ep);
|
||||
const hasRec = !!(recNode || recChain.length > 0);
|
||||
if (hasRec) {
|
||||
row.routerRecommended += 1;
|
||||
// Check if any recommended id falls within canonical set
|
||||
const canonSet = new Set(row.canonicalNodes.map(normalizeNodeId));
|
||||
const allRecIds = [];
|
||||
if (recNode) allRecIds.push(normalizeNodeId(recNode));
|
||||
for (const id of recChain) allRecIds.push(normalizeNodeId(id));
|
||||
if (allRecIds.some((id) => id && canonSet.has(id))) {
|
||||
row.recWithinCanon += 1;
|
||||
}
|
||||
}
|
||||
|
||||
const nodeChosen = (ep.primary_rationale || {}).node_chosen;
|
||||
if (nodeChosen && nodeChosen !== 'direct') {
|
||||
row.claudeTook += 1;
|
||||
}
|
||||
if (ep.outcome_reviewed === 'rework') {
|
||||
row.rework += 1;
|
||||
}
|
||||
}
|
||||
return [...byClass.values()].sort((a, b) => b.count - a.count);
|
||||
}
|
||||
|
||||
/**
|
||||
* Cut 9 — Router vs Opus three-section breakdown.
|
||||
* Returns { sectionA, sectionB, sectionC } — each an array of structured items.
|
||||
* Episodes lacking `review` are excluded from all sections.
|
||||
*/
|
||||
export function buildRouterVsOpus(episodes) {
|
||||
const sectionA = [];
|
||||
const sectionB = [];
|
||||
const sectionC = [];
|
||||
|
||||
for (const ep of episodes) {
|
||||
const rev = ep.review;
|
||||
if (!rev || typeof rev !== 'object' || rev.reviewer_error) continue;
|
||||
|
||||
const pr = ep.primary_rationale || {};
|
||||
const hasRec = hasRecommendation(ep);
|
||||
const recNode = getRecommendedNode(ep);
|
||||
const recChain = getRecommendedChain(ep);
|
||||
const routerRecommendation = recChain.length > 0 ? recChain : recNode;
|
||||
const time = (ep.timestamps || {}).started_at || null;
|
||||
const taskId = String(ep.task_id || '').slice(0, 8);
|
||||
const classification = pr.task_classification || 'other';
|
||||
const nodeChosen = pr.node_chosen || 'direct';
|
||||
const outcomeReviewed = ep.outcome_reviewed || 'unknown';
|
||||
|
||||
if (hasRec) {
|
||||
const isCorrectNoAlt = rev.node_quality === 'correct' && !rev.alternative_better;
|
||||
if (isCorrectNoAlt) {
|
||||
// Section C: router gave + Opus agreed it was fine (correct, no better alternative)
|
||||
sectionC.push({ time, taskId, classification, routerRecommendation, outcomeReviewed });
|
||||
} else {
|
||||
// Section A: router gave + some disagreement or uncertainty (wrong_node / disputable / has alternative)
|
||||
sectionA.push({
|
||||
time,
|
||||
taskId,
|
||||
classification,
|
||||
routerRecommendation,
|
||||
claudeChose: nodeChosen,
|
||||
opusNodeQuality: rev.node_quality || 'n/a',
|
||||
opusChainQuality: rev.chain_quality || 'n/a',
|
||||
outcomeReviewed,
|
||||
opusAlternative: rev.alternative_better || null,
|
||||
opusRootCause: rev.error_root_cause || 'n/a',
|
||||
});
|
||||
}
|
||||
} else if (!hasRec && rev.alternative_better) {
|
||||
// Section B: router silent, Opus identified a better node
|
||||
sectionB.push({
|
||||
time,
|
||||
taskId,
|
||||
classification,
|
||||
opusSuggests: rev.alternative_better,
|
||||
outcomeReviewed,
|
||||
opusReasoning: String(rev.reasoning || '').slice(0, 200),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return { sectionA, sectionB, sectionC };
|
||||
}
|
||||
|
||||
/**
|
||||
* Cut 10 — Chain-ignore breakdown.
|
||||
* Distinguishes chain recommendations from node-only recommendations and reports
|
||||
* ignore rates + rework rates, bucketed by chain length.
|
||||
*/
|
||||
export function buildChainIgnoreBreakdown(episodes) {
|
||||
const result = {
|
||||
totalChainRecommendations: 0,
|
||||
ignoredChainCount: 0,
|
||||
ignoredChainRework: 0,
|
||||
totalNodeOnlyRecommendations: 0,
|
||||
ignoredNodeOnlyCount: 0,
|
||||
ignoredNodeOnlyRework: 0,
|
||||
breakdownByChainLength: {
|
||||
'1': { count: 0, ignored: 0, rework: 0 },
|
||||
'2': { count: 0, ignored: 0, rework: 0 },
|
||||
'3+': { count: 0, ignored: 0, rework: 0 },
|
||||
},
|
||||
};
|
||||
|
||||
for (const ep of episodes) {
|
||||
const pr = ep.primary_rationale || {};
|
||||
const recNode = getRecommendedNode(ep);
|
||||
const recChain = getRecommendedChain(ep);
|
||||
const hasChain = recChain.length > 0;
|
||||
const hasNodeOnly = !hasChain && !!recNode;
|
||||
const nodeChosen = pr.node_chosen || 'direct';
|
||||
const isIgnored = nodeChosen === 'direct';
|
||||
const isRework = ep.outcome_reviewed === 'rework';
|
||||
|
||||
if (hasChain) {
|
||||
result.totalChainRecommendations += 1;
|
||||
const lenBucket = recChain.length === 1 ? '1' : recChain.length === 2 ? '2' : '3+';
|
||||
result.breakdownByChainLength[lenBucket].count += 1;
|
||||
if (isIgnored) {
|
||||
result.ignoredChainCount += 1;
|
||||
result.breakdownByChainLength[lenBucket].ignored += 1;
|
||||
if (isRework) {
|
||||
result.ignoredChainRework += 1;
|
||||
result.breakdownByChainLength[lenBucket].rework += 1;
|
||||
}
|
||||
}
|
||||
} else if (hasNodeOnly) {
|
||||
result.totalNodeOnlyRecommendations += 1;
|
||||
if (isIgnored) {
|
||||
result.ignoredNodeOnlyCount += 1;
|
||||
if (isRework) result.ignoredNodeOnlyRework += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/** Full deterministic aggregation: dedup → infer outcomes → group → chains → matrix → missed activations. */
|
||||
export function analyze(episodes, options = {}) {
|
||||
const deduped = dedupeEpisodes(episodes);
|
||||
@@ -441,6 +640,20 @@ export function analyze(episodes, options = {}) {
|
||||
}
|
||||
}
|
||||
|
||||
// Cuts 8/9/10 — read classificationMap from the archived file when not
|
||||
// passed via options (CLI invocation). Silent fallback to {} on missing/broken file.
|
||||
let canonMapForCuts = classificationMap;
|
||||
if (!Object.keys(canonMapForCuts).length) {
|
||||
try {
|
||||
const mapPath = pathResolve('docs/archive/llm-bootstrap-2026-05/routing-docs/observer-classification-map.json');
|
||||
const raw = readFileSync(mapPath, 'utf-8');
|
||||
const parsed = JSON.parse(raw);
|
||||
canonMapForCuts = parsed.map || {};
|
||||
} catch {
|
||||
canonMapForCuts = {};
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
episodeCount: normal.length,
|
||||
v1SkippedCount,
|
||||
@@ -457,6 +670,9 @@ export function analyze(episodes, options = {}) {
|
||||
reviewerCoverage,
|
||||
degradedCount,
|
||||
costTotals,
|
||||
classCanonCoverage: buildClassCanonCoverage(normal, canonMapForCuts),
|
||||
routerVsOpus: buildRouterVsOpus(normal),
|
||||
chainIgnoreBreakdown: buildChainIgnoreBreakdown(normal),
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
@@ -6,6 +6,9 @@ import {
|
||||
findCausalChains,
|
||||
buildFactorMatrix,
|
||||
analyze,
|
||||
buildClassCanonCoverage,
|
||||
buildRouterVsOpus,
|
||||
buildChainIgnoreBreakdown,
|
||||
} from './brain-retro-analyzer.mjs';
|
||||
|
||||
// Minimal v2 episode for tests.
|
||||
@@ -717,3 +720,288 @@ describe('analyze — Pass 4 similar_past_outcome_majority axis (project-brain-f
|
||||
expect(result.factorMatrix.similar_past_outcome_majority.no_neighbors).toBeDefined();
|
||||
});
|
||||
});
|
||||
|
||||
// ────────────────────────────────────────────────────────────────
|
||||
// NEW CUTS: buildClassCanonCoverage, buildRouterVsOpus, buildChainIgnoreBreakdown
|
||||
// ────────────────────────────────────────────────────────────────
|
||||
|
||||
// Shared classMap fixture (embedded — no external file dependency)
|
||||
const testClassMap = {
|
||||
monitoring: ['#34', '#35'],
|
||||
bugfix: ['#18', '#34'],
|
||||
feature: ['#19'],
|
||||
release: ['#37'],
|
||||
planning: ['#19', '#41', '#42'],
|
||||
other: [],
|
||||
};
|
||||
|
||||
// Helper: episode for the new cuts (minimal — no embeddings needed)
|
||||
const epC = (overrides = {}) => ({
|
||||
schema_version: 2,
|
||||
task_id: 's1',
|
||||
timestamps: { started_at: '2026-05-19T10:00:00Z', ended_at: '2026-05-19T10:05:00Z' },
|
||||
primary_rationale: {
|
||||
node_chosen: 'direct',
|
||||
task_classification: 'other',
|
||||
recommended_node: null,
|
||||
recommended_chain: null,
|
||||
},
|
||||
outcome_reviewed: 'unknown',
|
||||
...overrides,
|
||||
});
|
||||
|
||||
describe('buildClassCanonCoverage', () => {
|
||||
it('returns [] for empty input', () => {
|
||||
expect(buildClassCanonCoverage([], testClassMap)).toEqual([]);
|
||||
});
|
||||
|
||||
it('single monitoring episode with recommended_node=#34, node_chosen=direct, rework', () => {
|
||||
const eps = [epC({
|
||||
primary_rationale: { node_chosen: 'direct', task_classification: 'monitoring', recommended_node: '#34', recommended_chain: null },
|
||||
outcome_reviewed: 'rework',
|
||||
})];
|
||||
const rows = buildClassCanonCoverage(eps, testClassMap);
|
||||
expect(rows).toHaveLength(1);
|
||||
const row = rows[0];
|
||||
expect(row.classification).toBe('monitoring');
|
||||
expect(row.count).toBe(1);
|
||||
expect(row.canonicalNodes).toEqual(['#34', '#35']);
|
||||
expect(row.routerRecommended).toBe(1); // has recommended_node
|
||||
expect(row.claudeTook).toBe(0); // node_chosen === 'direct'
|
||||
expect(row.recWithinCanon).toBe(1); // '#34' is in canonical
|
||||
expect(row.rework).toBe(1);
|
||||
});
|
||||
|
||||
it('classification not in map gets canonicalNodes=[]', () => {
|
||||
const eps = [epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: null }, outcome_reviewed: 'success' })];
|
||||
const rows = buildClassCanonCoverage(eps, {});
|
||||
expect(rows[0].canonicalNodes).toEqual([]);
|
||||
});
|
||||
|
||||
it('recommended_chain with numeric ids normalized to #N for canon check', () => {
|
||||
const eps = [epC({
|
||||
primary_rationale: { node_chosen: 'direct', task_classification: 'monitoring', recommended_node: null, recommended_chain: [19, 34] },
|
||||
outcome_reviewed: 'success',
|
||||
})];
|
||||
const rows = buildClassCanonCoverage(eps, testClassMap);
|
||||
// chain [19,34] → normalized ['#19','#34']. '#34' is in monitoring canonical → recWithinCanon=1
|
||||
expect(rows[0].routerRecommended).toBe(1);
|
||||
expect(rows[0].recWithinCanon).toBe(1);
|
||||
});
|
||||
|
||||
it('mixed: 3 release episodes sorted desc, counting correctly', () => {
|
||||
// 3 release, 2 feature (release > feature by count)
|
||||
const eps = [
|
||||
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'release', recommended_node: '#37', recommended_chain: null }, outcome_reviewed: 'rework', timestamps: { started_at: '2026-05-19T10:00:00Z' } }),
|
||||
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'release', recommended_node: '#99', recommended_chain: null }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:01:00Z' } }),
|
||||
epC({ primary_rationale: { node_chosen: '#37', task_classification: 'release', recommended_node: '#37', recommended_chain: null }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:02:00Z' } }),
|
||||
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'feature', recommended_node: '#19', recommended_chain: null }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:03:00Z' } }),
|
||||
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'feature', recommended_node: null, recommended_chain: null }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:04:00Z' } }),
|
||||
];
|
||||
const rows = buildClassCanonCoverage(eps, testClassMap);
|
||||
// Sorted by count desc: release=3, feature=2
|
||||
expect(rows[0].classification).toBe('release');
|
||||
expect(rows[0].count).toBe(3);
|
||||
expect(rows[0].routerRecommended).toBe(3); // all 3 have recommended_node
|
||||
expect(rows[0].claudeTook).toBe(1); // one has node_chosen='#37'
|
||||
expect(rows[0].recWithinCanon).toBe(2); // '#37' in release canonical for ep1 and ep3; '#99' not in canonical for ep2
|
||||
expect(rows[0].rework).toBe(1);
|
||||
expect(rows[1].classification).toBe('feature');
|
||||
expect(rows[1].count).toBe(2);
|
||||
expect(rows[1].routerRecommended).toBe(1); // only 1 has recommended_node
|
||||
expect(rows[1].claudeTook).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
describe('buildRouterVsOpus', () => {
|
||||
const epR = (overrides = {}) => ({
|
||||
schema_version: 4,
|
||||
task_id: 'session-abc-12345',
|
||||
timestamps: { started_at: '2026-05-19T10:00:00Z' },
|
||||
primary_rationale: {
|
||||
node_chosen: 'direct',
|
||||
task_classification: 'other',
|
||||
recommended_node: null,
|
||||
recommended_chain: null,
|
||||
},
|
||||
outcome_reviewed: 'unknown',
|
||||
review: {
|
||||
node_quality: 'correct',
|
||||
chain_quality: 'n/a',
|
||||
alternative_better: null,
|
||||
error_root_cause: 'n/a',
|
||||
reasoning: 'ok',
|
||||
},
|
||||
...overrides,
|
||||
});
|
||||
|
||||
it('one episode in each of A/B/C → 1/1/1', () => {
|
||||
const eps = [
|
||||
// A: router gave recommendation, has review
|
||||
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'feature', recommended_node: '#19', recommended_chain: null },
|
||||
review: { node_quality: 'wrong_node', chain_quality: 'n/a', alternative_better: '#37', error_root_cause: 'wrong_skill', reasoning: 'x' }, outcome_reviewed: 'rework' }),
|
||||
// B: router silent, alternative_better set
|
||||
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'planning', recommended_node: null, recommended_chain: null },
|
||||
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: '#41', error_root_cause: 'n/a', reasoning: 'should have used planning' }, outcome_reviewed: 'soft_success',
|
||||
timestamps: { started_at: '2026-05-19T10:01:00Z' } }),
|
||||
// C: router gave, node_quality=correct, no alternative
|
||||
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'release', recommended_node: '#37', recommended_chain: null },
|
||||
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: null, error_root_cause: 'n/a', reasoning: 'direct was fine' }, outcome_reviewed: 'success',
|
||||
timestamps: { started_at: '2026-05-19T10:02:00Z' } }),
|
||||
];
|
||||
const result = buildRouterVsOpus(eps);
|
||||
expect(result.sectionA).toHaveLength(1);
|
||||
expect(result.sectionB).toHaveLength(1);
|
||||
expect(result.sectionC).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('episode without review is excluded from all three sections', () => {
|
||||
const eps = [
|
||||
epR({ review: undefined, primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: '#19', recommended_chain: null } }),
|
||||
];
|
||||
const result = buildRouterVsOpus(eps);
|
||||
expect(result.sectionA).toHaveLength(0);
|
||||
expect(result.sectionB).toHaveLength(0);
|
||||
expect(result.sectionC).toHaveLength(0);
|
||||
});
|
||||
|
||||
it('A: episode with recommended_chain array of strings goes into A with routerRecommendation = the array', () => {
|
||||
const eps = [
|
||||
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'planning', recommended_node: null, recommended_chain: ['#19', '#41'] },
|
||||
review: { node_quality: 'wrong_node', chain_quality: 'missing_step', alternative_better: '#19', error_root_cause: 'wrong_chain_order', reasoning: 'chain needed' }, outcome_reviewed: 'rework' }),
|
||||
];
|
||||
const result = buildRouterVsOpus(eps);
|
||||
expect(result.sectionA).toHaveLength(1);
|
||||
expect(Array.isArray(result.sectionA[0].routerRecommendation)).toBe(true);
|
||||
expect(result.sectionA[0].routerRecommendation).toEqual(['#19', '#41']);
|
||||
});
|
||||
|
||||
it('B: router silent AND alternative_better truthy → in B; router silent AND alternative_better=null → not in B', () => {
|
||||
const eps = [
|
||||
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: null },
|
||||
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: '#60', error_root_cause: 'n/a', reasoning: 'should use docs' }, outcome_reviewed: 'soft_success' }),
|
||||
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: null },
|
||||
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: null, error_root_cause: 'n/a', reasoning: 'fine' }, outcome_reviewed: 'success',
|
||||
timestamps: { started_at: '2026-05-19T10:01:00Z' } }),
|
||||
];
|
||||
const result = buildRouterVsOpus(eps);
|
||||
expect(result.sectionB).toHaveLength(1);
|
||||
expect(result.sectionB[0].opusSuggests).toBe('#60');
|
||||
});
|
||||
|
||||
it('C: router gave + node_quality=correct + no alternative → in C; same but alternative_better truthy → NOT in C', () => {
|
||||
const inC = epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'release', recommended_node: '#37', recommended_chain: null },
|
||||
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: null, error_root_cause: 'n/a', reasoning: 'fine' }, outcome_reviewed: 'success' });
|
||||
const notInC = epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'release', recommended_node: '#37', recommended_chain: null },
|
||||
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: '#41', error_root_cause: 'n/a', reasoning: 'actually #41 better' }, outcome_reviewed: 'rework',
|
||||
timestamps: { started_at: '2026-05-19T10:01:00Z' } });
|
||||
const result = buildRouterVsOpus([inC, notInC]);
|
||||
expect(result.sectionC).toHaveLength(1);
|
||||
// The one NOT in C (has alternative_better) should be in A instead
|
||||
expect(result.sectionA).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('sectionA item has all expected shape fields', () => {
|
||||
const eps = [
|
||||
// Must be wrong_node or have alternative to end up in A (not C)
|
||||
epR({ primary_rationale: { node_chosen: 'direct', task_classification: 'feature', recommended_node: '#19', recommended_chain: null },
|
||||
review: { node_quality: 'wrong_node', chain_quality: 'n/a', alternative_better: '#37', error_root_cause: 'wrong_skill', reasoning: 'should be #37' }, outcome_reviewed: 'rework' }),
|
||||
];
|
||||
const result = buildRouterVsOpus(eps);
|
||||
const item = result.sectionA[0];
|
||||
expect(item).toHaveProperty('time');
|
||||
expect(item).toHaveProperty('taskId');
|
||||
expect(item).toHaveProperty('classification');
|
||||
expect(item).toHaveProperty('routerRecommendation');
|
||||
expect(item).toHaveProperty('claudeChose');
|
||||
expect(item).toHaveProperty('opusNodeQuality');
|
||||
expect(item).toHaveProperty('opusChainQuality');
|
||||
expect(item).toHaveProperty('outcomeReviewed');
|
||||
expect(item).toHaveProperty('opusAlternative');
|
||||
expect(item).toHaveProperty('opusRootCause');
|
||||
expect(item.taskId).toHaveLength(8); // first 8 chars of task_id
|
||||
});
|
||||
});
|
||||
|
||||
describe('buildChainIgnoreBreakdown', () => {
|
||||
it('returns all zeros for empty input', () => {
|
||||
const result = buildChainIgnoreBreakdown([]);
|
||||
expect(result.totalChainRecommendations).toBe(0);
|
||||
expect(result.ignoredChainCount).toBe(0);
|
||||
expect(result.ignoredChainRework).toBe(0);
|
||||
expect(result.totalNodeOnlyRecommendations).toBe(0);
|
||||
expect(result.ignoredNodeOnlyCount).toBe(0);
|
||||
expect(result.ignoredNodeOnlyRework).toBe(0);
|
||||
expect(result.breakdownByChainLength['1']).toEqual({ count: 0, ignored: 0, rework: 0 });
|
||||
expect(result.breakdownByChainLength['2']).toEqual({ count: 0, ignored: 0, rework: 0 });
|
||||
expect(result.breakdownByChainLength['3+']).toEqual({ count: 0, ignored: 0, rework: 0 });
|
||||
});
|
||||
|
||||
it('chain-len-4 ep with node_chosen=direct and outcome=rework → ignoredChainCount=1, rework=1, 3+ bucket', () => {
|
||||
const eps = [epC({
|
||||
primary_rationale: { node_chosen: 'direct', task_classification: 'planning', recommended_node: null, recommended_chain: ['#19','#41','#42','#37'] },
|
||||
outcome_reviewed: 'rework',
|
||||
})];
|
||||
const result = buildChainIgnoreBreakdown(eps);
|
||||
expect(result.totalChainRecommendations).toBe(1);
|
||||
expect(result.ignoredChainCount).toBe(1);
|
||||
expect(result.ignoredChainRework).toBe(1);
|
||||
expect(result.breakdownByChainLength['3+']).toEqual({ count: 1, ignored: 1, rework: 1 });
|
||||
});
|
||||
|
||||
it('node-only rec ep with node_chosen=direct → ignoredNodeOnlyCount=1', () => {
|
||||
const eps = [epC({
|
||||
primary_rationale: { node_chosen: 'direct', task_classification: 'monitoring', recommended_node: '#34', recommended_chain: null },
|
||||
outcome_reviewed: 'success',
|
||||
})];
|
||||
const result = buildChainIgnoreBreakdown(eps);
|
||||
expect(result.totalNodeOnlyRecommendations).toBe(1);
|
||||
expect(result.ignoredNodeOnlyCount).toBe(1);
|
||||
expect(result.ignoredNodeOnlyRework).toBe(0);
|
||||
expect(result.totalChainRecommendations).toBe(0);
|
||||
});
|
||||
|
||||
it('chains of length 1, 2, 5 bucketed correctly into 1/2/3+', () => {
|
||||
const eps = [
|
||||
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: ['#19'] }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:00:00Z' } }),
|
||||
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: ['#19','#34'] }, outcome_reviewed: 'success', timestamps: { started_at: '2026-05-19T10:01:00Z' } }),
|
||||
epC({ primary_rationale: { node_chosen: 'direct', task_classification: 'other', recommended_node: null, recommended_chain: ['#19','#34','#37','#41','#42'] }, outcome_reviewed: 'rework', timestamps: { started_at: '2026-05-19T10:02:00Z' } }),
|
||||
];
|
||||
const result = buildChainIgnoreBreakdown(eps);
|
||||
expect(result.totalChainRecommendations).toBe(3);
|
||||
expect(result.breakdownByChainLength['1']).toEqual({ count: 1, ignored: 1, rework: 0 });
|
||||
expect(result.breakdownByChainLength['2']).toEqual({ count: 1, ignored: 1, rework: 0 });
|
||||
expect(result.breakdownByChainLength['3+']).toEqual({ count: 1, ignored: 1, rework: 1 });
|
||||
});
|
||||
|
||||
it('chain-rec ep where node_chosen != direct → in totalChainRecommendations but NOT in ignoredChainCount', () => {
|
||||
const eps = [epC({
|
||||
primary_rationale: { node_chosen: '#19', task_classification: 'feature', recommended_node: null, recommended_chain: ['#19', '#34'] },
|
||||
outcome_reviewed: 'success',
|
||||
})];
|
||||
const result = buildChainIgnoreBreakdown(eps);
|
||||
expect(result.totalChainRecommendations).toBe(1);
|
||||
expect(result.ignoredChainCount).toBe(0);
|
||||
expect(result.breakdownByChainLength['2']).toEqual({ count: 1, ignored: 0, rework: 0 });
|
||||
});
|
||||
});
|
||||
|
||||
describe('analyze — classCanonCoverage / routerVsOpus / chainIgnoreBreakdown integrated', () => {
|
||||
it('analyze() result includes classCanonCoverage, routerVsOpus, chainIgnoreBreakdown keys', () => {
|
||||
const eps = [
|
||||
ep({ schema_version: 4,
|
||||
primary_rationale: { node_chosen: 'direct', task_classification: 'feature', recommended_node: '#19', recommended_chain: null, triggers_matched: [], boundaries_applied: [], step: 1, candidates_considered: [], hard_floor: { invoked: false, rules: [] } },
|
||||
review: { node_quality: 'correct', chain_quality: 'n/a', alternative_better: null, error_root_cause: 'n/a', reasoning: 'ok' },
|
||||
outcome_reviewed: 'success' }),
|
||||
];
|
||||
const result = analyze(eps);
|
||||
expect(result.classCanonCoverage).toBeDefined();
|
||||
expect(result.routerVsOpus).toBeDefined();
|
||||
expect(result.chainIgnoreBreakdown).toBeDefined();
|
||||
expect(Array.isArray(result.classCanonCoverage)).toBe(true);
|
||||
expect(result.routerVsOpus).toHaveProperty('sectionA');
|
||||
expect(result.routerVsOpus).toHaveProperty('sectionB');
|
||||
expect(result.routerVsOpus).toHaveProperty('sectionC');
|
||||
expect(result.chainIgnoreBreakdown).toHaveProperty('totalChainRecommendations');
|
||||
});
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user