Closes brain-retro 2026-05-20 #7 — each tool_result.is_error now emits
{ kind:'error', tool:<name>, summary:<first 80 chars> }. Allows
aggregation by tool (Bash/Edit/Read) + cause prefix (ENOENT/timeout/
'String to replace not found').
Required updating existing 'emits error events for tool_result with
is_error' test assertion (old shape had bare 'message' field).
4 new vitest tests + 1 existing relaxed, 286/286 GREEN.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add extractAskUserQuestionEvents() — for each AskUserQuestion toolUseResult emits
one event per question with answer_kind: option|custom|no_answer and question_count.
Integrated into parseTranscript events pipeline. 7 new tests (263 total, 0 failed).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- export extractTokenUsage(turn): sums input/output/cache/iterations/
web_search/web_fetch across all assistant messages in a turn
- parseTranscript now includes task_cost field (zero-filled when no usage)
- 7 new tests (5 unit + 2 integration); total 248/248 GREEN
- V2_FIELDS in observer-stop-hook.mjs NOT changed (backward compat)
Closes brain-retro 2026-05-20 #1 — analysis/memory-sync/regulatory-bump/
release/cleanup/monitoring/planning. Addresses '59% other' observation
from initial retro factor matrix.
Ordering: release before feature (merge feature-branch), planning before
refactor (план рефакторинга), memory-sync/regulatory-bump at top as most
specific. monitoring regex проверь состоян covers inflected forms.
9 new vitest tests, 241/241 GREEN in npm run test:tools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bug: inferOutcome flagged `blocked` whenever errorCount > retryCount across
the turn's events. But the parser emits an `error` event for ANY tool_result
with is_error=true — including expected failures: TDD failing-test-first,
grep returning nothing, git commands with intentional non-zero exit. On
TDD-heavy turns (project's standard discipline) this systematically marked
turns as blocked even when they ended on a successful tool_use.
Fix:
- Parser (extractProcessEvents): walk turn from end, find the LAST
tool_result; if its is_error=true, emit a single `unrecovered_error`
event. Distinguishes "turn ended on failure" from "errors recovered
later". The original per-is_error `error` events remain (useful as raw
factor signals).
- Analyzer (inferOutcome): replace `errorCount > retryCount → blocked`
with `events.some(kind === 'unrecovered_error') → blocked`. Same
ordering preserved (interrupt > blocked > rework/success/unknown).
Tests:
- Parser: emits unrecovered_error when last tool_result is_error;
does NOT emit when turn ended on a successful tool_result;
does NOT emit for turns with no tool_results.
- Analyzer: blocked iff unrecovered_error event present (not raw count);
events=[error, error, retry] → success (no unrecovered_error).
142/142 vitest green (was 128).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug: Claude Code's transcript JSONL file accumulates duplicated context-
rebuild snapshots — the same entry re-printed with the SAME `uuid`. Without
dedup, session_turn / task_size / events double-count, and session_turn
becomes non-monotonic across episodes parsed at different file-growth
states. Live evidence: episodes-2026-05.jsonl lines 14/15/16 of the same
session showed session_turn 139 → 140 → 91 (backwards in time). Probe
on transcript 553717ec: 22400 entries, only 6074 unique uuid (68% dup
rate); real user prompts 264 total vs 92 unique-uuid.
Fix: parseLines now tracks a `seenUuid` Set and skips entries whose uuid
has already been encountered (keep-first). Entries without `uuid`
(synthetic test fixtures) pass through unchanged. All downstream functions
(findTurnStart, extractEnvironment, extractTaskSize, etc.) operate on the
deduped entries array, so the fix is single-point and total.
Tests: new `parseTranscript — uuid-dedup` describe block covers
(1) duplicated-uuid prompts collapse → session_turn counts once,
(2) distinct-uuid entries preserved (no over-dedup),
(3) no-uuid entries pass through (synthetic-fixture safety),
(4) duplicated-uuid assistant turns → tool_calls / files_touched counted once.
110/110 parser tests green (was 106).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
extractEnvironment was scanning JSON.stringify(turn) for collision markers
(чужой staged / foreign git index / index.lock / another git process). Prose
mentions in user/assistant text flipped parallel_session=true. Live FP proven
on episodes-2026-05.jsonl line 20: my own analysis turn was non-parallel but
recorded parallel_session: true because the finding text mentioned the markers.
Fix: collectToolResultText(turn) — gather text only from tool_result blocks
(both string content and structured `[{type:text,text}]` arrays). Scan THAT
for collision markers; prose is no longer a signal.
Tests: rewrote `parallel_session narrowed` block — false on user/assistant
prose / no-tool-result turns; true on tool_result strings + structured form.
106/106 parser tests green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0.2: count session_turn from the last compaction. The transcript file
accumulates duplicated context-rebuild snapshots (quirk #101), so counting
real prompts from i=0 inflated it and made it non-monotonic. Now counts
"real prompts since the last compaction" — monotonic by construction.
P0.1a: widen the correction prompt_signal regex (не работает / сломал /
опять / откати / revert / still not / wrong / ...). The old regex was too
narrow, so rework outcomes were invisible to the factor analysis.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1 — detectAskUserQuestionChoice: when a turn contains an AskUserQuestion
whose answer exactly matches an offered option label, classify as
user_chose_from_options. The answered entry carries a structured
toolUseResult (questions[].options[].label + answers map). A custom
"Other" free-text answer is NOT a pick — falls through. Wired into
parseTranscript after the text-list detector.
#3 — parallel_session: dropped broad word matches (параллельн /
"parallel session") that false-fired on any casual mention. Now only
strong collision evidence (foreign git index / чужой staged /
index.lock / another git process). Best-effort per spec R2 — prefer
false-negative over false-positive.
169/169 tools tests GREEN (+9 new).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause (systematic-debugging): isRealUserPrompt treated skill-content
("Base directory for this skill:"), local-command output
(<local-command-stdout>), and interrupt markers as genuine prompts.
findTurnStart then anchored a turn on the synthetic message — the turn
slice missed the genuine prompt's UserPromptSubmit hook_additional_context
attachment → economy_level: null, wrong prompt_signal/task_classification.
Same cause made extractLastUserPromptText return skill content, so the
Stop-hook routing-gate false-positive-blocked autonomous §12 skill
invocations (detectMethodDirected saw the node name in skill text).
Fix: SYNTHETIC_PROMPT_MARKERS + isSyntheticPrompt — isRealUserPrompt
returns false for synthetic messages. One fix closes both the
economy_level capture gap and the 2nd routing-gate FP class.
160/160 tools tests GREEN (+3 new).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
detectChoiceProvenance runs BEFORE parseRoutingTag; if last assistant
turn offered options and user prompt references one, decision_provenance
becomes user_chose_from_options. Otherwise falls back to existing
routing-tag / autonomous logic.
3 new parser tests GREEN; all existing tests still GREEN (43/43).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Stop-hook was writing empty-shell episodes (task_id "unknown-<ts>",
node_chosen "unknown", events []). Root cause: buildEpisodeFromContext
read fields from the Stop-event stdin that Claude Code never sends
(primary_rationale, node_chosen, ...) and the session field name was
wrong (ctx.sessionId camelCase vs Claude Code's session_id). The hook
never read transcript_path — the only real source of session data.
New tools/observer-transcript-parser.mjs — pure parseTranscript(text,
fallbackSessionId):
- Scopes to the last turn (from the last real user prompt to EOF) —
one episode == one prompt→response cycle. A tool_result-carrier user
message is not treated as a turn boundary.
- Extracts task_id (real sessionId), timestamps (real duration),
skill_invoked events, a tool_summary event with per-tool counts,
error events (tool_result is_error), node_chosen (first skill, else
"direct"), hard_floor (invoked when a superpowers:* skill is used),
path_type (regulated/improvised), task_classification (keyword
heuristic on the prompt).
- Reasoning fields triggers_matched/candidates_considered/
boundaries_applied stay [] — not recoverable from a transcript;
their capture is a separate ADR-011 follow-up.
observer-stop-hook.mjs: reads ctx.transcript_path + ctx.session_id
(camelCase fallback kept), readFileSync best-effort, delegates to
parseTranscript. No transcript → graceful fallback to ctx defaults.
Episode schema (5 mandatory + 7-field primary_rationale) unchanged —
no normative change. Stop-event is never blocked (exit 0 on any error).
TDD: 17 parseTranscript tests + 1 buildEpisodeFromContext transcript
test. Full tools Vitest 70/70 GREEN. CLI smoke against a real 575-entry
transcript: episode populated — real task_id, ~6.5 min duration,
tool_summary {Bash:5,Read:5,Grep:1,Edit:9,Write:1}, error event.
Refs: ADR-011 brain governance §6.2 (observer evidence loop).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>