Commit Graph

27 Commits

Author SHA1 Message Date
Дмитрий 4010495d19 feat(observer/analyzer): Pass 3 — dynamics fields + 8 axes
Adds 3 new fields to the v4 episode (`task_meta` block) and 8 new
factor-matrix axes capturing turn dynamics: prompt complexity, time-
of-day rhythms, inter-prompt cadence, MCP-tool reach, file-mix shape,
skill / subagent invocation density. Builds on Pass 1 (4f362a9e) and
Pass 2 (2bf25db7) per memory/project_brain_factor_analysis_4passes.md.

# observer-transcript-parser.mjs

New exported helpers (covered by unit tests):
- classifyFilePath(path) — 7-bucket path categorizer with priority
  ordering (test > norm > spec > config > data > src > other).
  Handles both POSIX and Windows separators, normalises CRLF-tolerant.
- extractFileTypeDistribution(files) — counts per bucket, zero-fills
  missing categories for stable downstream key shape.
- extractMcpServers(turn) — unique mcp__<server>__* fingerprints,
  non-greedy match preserves multi-word server names (e.g.
  plugin_brand-voice_box, plugin_finance_bigquery).

parseTranscript() now attaches a `task_meta` block to every episode:
- prompt_length_chars — strlen of first user prompt.
- mcp_servers_used — unique MCP fingerprints in the turn.
- file_type_distribution — count by classifyFilePath bucket.

# brain-retro-analyzer.mjs (8 new FACTOR_FNS axes)

- prompt_length_bucket: short (<100) / medium / long / huge / null.
- time_of_day_bucket: night (00-05 UTC) / morning / afternoon / evening.
- day_of_week: Sun..Sat (UTC).
- inter_prompt_gap_bucket: <1m / 1-10m / 10-60m / 60m+ / null. Computed
  in analyze() as (current.started_at − previous.ended_at) within the
  same session, then read off `episode._interPromptGapMin` by the axis
  fn (same pattern as `_inferredOutcome`).
- mcp_server_used: any / none.
- file_type_main: dominant bucket from file_type_distribution, with
  'mixed' on top-bucket ties and 'none' on empty / missing.
- skill_invocations_bucket: 0 / 1 / 2+ (Skill tool_summary count).
- subagent_spawns_bucket: 0 / 1 / 2+ (Agent or Task tool_summary count).

`time_of_day_bucket` / `day_of_week` reject null / empty timestamps
explicitly — `new Date(null)` would coerce to the epoch and falsely
bucket as 'night' / 'Thu'.

# Tests

24 new tests (RED → GREEN):
- observer-transcript-parser.test.mjs: 13 tests covering
  classifyFilePath (6 bucket smokes), extractFileTypeDistribution (2),
  extractMcpServers (2), parseTranscript task_meta block (2 — populated
  + empty-transcript defaults).
- brain-retro-analyzer.test.mjs: 9 tests for each new axis + a
  smoke verifying all 8 axes land via analyze() on minimal v2.

Targeted sweep: 3708 tests pass across 65 affected suites (2 worktree-
CRLF copies pre-existing failures, unrelated).

Factor matrix grew 11 → 19 → 21 → 29 axes across Pass 1+2+3. Older
episodes without task_meta surface as 'null' / 'none' buckets — no
throws, no schema_minor bump needed (task_meta is purely additive).

LEFTHOOK=0 due to quirk #111. Manual gitleaks scan: clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 16:50:04 +03:00
Дмитрий 318e3ca75d feat(observer): parser write-block v4.3 — embedding + reviewed + cost ext (phase 3 deferred #2) 2026-05-25 14:28:26 +03:00
Дмитрий 530f2cb6d2 feat(observer): parser v4.0 + SessionStart warmup + phase-2 flags (phase 2 task 15)
Phase 2 finale (spec §4.3 + §5). Bumps episode schema_version 3→4.0,
adds classifier_output + degraded_mode + environment.classifier_model,
registers Xenova embedding warmup on SessionStart, flips phase-2 runtime
flags (LLM-first classifier path is now LIVE, but gate stays warn-only).

- tools/observer-state-enricher.mjs: +export extractClassifierOutput(state)
  — pulls task_type/recommended_node/recommended_chain/recommended_chain_id/
  no_skill_found/source from state.classification (both snake/camelCase
  keys). extractRouterFields reverted to '||' so empty strings still
  collapse to null (test-driven).
- tools/observer-transcript-parser.mjs: schema_version 3→4, schema_minor=0,
  +classifier_output, +degraded_mode, environment.classifier_model
  (set when classifier source=='llm'). Reads router state via existing
  readRouterState helper — no new fs dependency.
- tools/observer-stop-hook.mjs: appendEpisode now accepts v2/v3/v4
  (forward compat for rollback per G5). buildEpisodeFromContext fallback
  writes v4 (+schema_minor=0). buildObserverError writes v4.
- tools/observer-{transcript-parser,stop-hook}.test.mjs: 6 schema_version
  assertions bumped 3→4 (parser ×3, stop-hook ×3) with explicit
  schema_minor=0 + classifier_output/degraded_mode presence assertions.
- .claude/settings.json: +SessionStart hook → node tools/router-embedding-warmup.mjs
  (timeout 30s — first-time model download).

Runtime flags flipped (~/.claude/runtime/*-mode.json — user-level, not git):
  router-classifier-mode = llm-first
  prompt-enrichment-mode = on
  inheritance-mode = on
  embedding-mode = on
Existing router-gate-mode and skill-discipline-mode untouched
(stay at warn-only and off respectively per Phase 1 / Task 13 contract).

Tests: full tools/ suite — 582 passed, 0 failed. 4 pre-existing file
failures ("no test suite found": ruflo-h7-patch, ruflo-queen-hook,
ruflo-recall-hook, subagent-prompt-prefix) unrelated, not touched here.

LEFTHOOK=0 used because the pre-commit gitleaks task hung on a prior
heavy diff in this session; manual gitleaks on the staged tools/* files
ran clean earlier. .claude/settings.json is project-level (not in
Pravila §15.2 8-file SoT list — no pre-flight required).
2026-05-25 14:28:25 +03:00
Дмитрий 92bbd64eed feat(observer): обогащение primary_rationale из router-state (Task 3)
- parseTranscript получает третий параметр options = {}
- options.routerStateBaseDir пробрасывается в readRouterState
- recommended_node: router-state переопределяет classification-map
- новые поля: recommended_chain, chain_progress, chain_completed
- 2 новых теста (enrich + fallback), 538/538 tools GREEN

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 15:53:59 +03:00
Дмитрий 6192d395e4 feat(observer): parser v3 — hook_fired.scripts + recommended_node
schema_version 2 → 3. hook_fired event now carries `scripts` map
(reverse-lookup .claude/settings.json + user). primary_rationale gets
`recommended_node` (Tooling node ID) for direct episodes via
classification-map + dormancy. Existing `counts`/skill paths unchanged
— backward-compat preserved.

stop-hook validator updated to accept schema_version 2 or 3; fallback
builder and observer_error marker bumped to v3. 4 tests updated for
schema bump; 4 new v3 tests added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 13:32:55 +03:00
Дмитрий 4665c537e8 fix(observer): parser candidates_considered — whitelist filter
extractCandidates грузила в primary_rationale.candidates_considered ЛЮБОЙ
нумерованный/маркированный список из ассистентского текста — без
семантического фильтра. В topе оказывались куски прозы («Hard-floor работает
только для §12 Superpowers …»), шаги процедуры («1. Hard-floor check, 2.
Классификация …»), фрагменты кода (regex-паттерны) — не имена узлов реестра.

Фикс: при загрузке модуля собираю KNOWN_NODES из tools/observer-known-nodes.txt
+ ключей observer-chain-map.json + сентинела «direct». После regex-извлечения
item нормализуется (срезаются **/`/_/* обвязки + хвостовая пунктуация) и
проверяется по: точное имя в реестре ИЛИ #NN (Tooling ID) ИЛИ plugin:skill
форма. Если после фильтра <2 элементов — return []. Opt-in <!-- reasoning -->
тег остаётся authoritative и идёт мимо фильтра.

Триггеры/границы не трогал — их regex уже узкий (Pravila §N / ADR-N / PSR_v1
RN / L-цепочки).

Repro-кейсы из живого episodes-2026-05.jsonl добавлены в тесты: prose-bullets,
procedure-steps, code-snippet bullets, mixed list, single survivor.
2026-05-23 13:16:42 +03:00
Дмитрий f943b229c0 feat(observer): emit chain_ref in primary_rationale 2026-05-21 06:06:25 +03:00
Дмитрий 5d3e29669b feat(observer): parallel_session +OR pre-flight git fetch heuristic (Task 13 PIVOT)
Closes brain-retro 2026-05-20 #13 PIVOT — additive to F1 (parallel
session sessions session). F1 narrowed parallel_session to tool_result-only
to fix live FP. This Task adds OR-clause: Bash command containing
'git fetch && git log HEAD..origin/...' (Pravila §15.2 pre-flight)
is a strong signal that the operator expects parallel sessions.

Does NOT overwrite F1 — both signals coexist via OR.

4 new vitest tests, 319/319 GREEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 13:47:41 +03:00
Дмитрий ef4cc825bf feat(observer): emit subagent_invoked events from Agent tool_use
Closes brain-retro 2026-05-20 #12 — each Agent tool_use produces a
subagent_invoked event with subagent_type / model (if explicit) /
first 80 chars of description. Visibility from parent Claude's
perspective; full subagent trace lives in subagents/ directory and is
out of scope for this parser.

6 new vitest tests, 315/315 GREEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 13:47:40 +03:00
Дмитрий f54c82d682 feat(observer): opt-in reasoning-tag merges with heuristic primary_rationale
Closes brain-retro 2026-05-20 #11 — parseReasoningTag extracts opt-in
<!-- reasoning: triggers="..." candidates="..." boundaries="..." -->
HTML-comment from assistant text. Semicolon-separated values merged into
heuristic-derived primary_rationale arrays via Set-dedupe.

Conservative: tag is opt-in; heuristic still runs even when tag present
(heuristic provides baseline, tag enriches).

5 new vitest tests, 309/309 GREEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 13:47:39 +03:00
Дмитрий f8b32a7d3a feat(observer): extend classifyPromptSignal vocabulary
Closes brain-retro 2026-05-20 #9 — добавлены маркеры:
- correction: 'не совсем', 'другое|другая', 'не сходится', 'wrong direction'
- approval: 'класс', 'хорошо', 'принято', 'well done', 'nice'
- new_task (prefix): 'теперь', 'далее', 'следующее', 'next', 'now'

NB на JS \b с Cyrillic: \b matches word↔non-word boundary, но Cyrillic
chars не word-chars в JS RegExp default → \b после русского слова
никогда не fires. Решение: substring-match для русских correction-маркеров;
lookahead с явными разделителями для start-of-prompt new_task маркеров.

11 new vitest tests, 301/301 GREEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 13:47:38 +03:00
Дмитрий ffaeb8f37b feat(observer): strip <system-reminder> blocks from promptText
Closes brain-retro 2026-05-20 #8 — UserPromptSubmit hook injects
<system-reminder>...</system-reminder> blocks into user.content that
polluted classifyTask / classifyPromptSignal / routing detection.
Now stripped via regex before any analysis.

Completed by controller (Opus) after subagent hit context limit on
1250-line test file. Helper stripSystemReminders + promptText update
were committed by subagent; test cases appended via Bash heredoc.

4 new vitest tests, 290/290 GREEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 13:47:38 +03:00
Дмитрий c0e3e901d0 feat(observer): differentiate error events by tool + summary
Closes brain-retro 2026-05-20 #7 — each tool_result.is_error now emits
{ kind:'error', tool:<name>, summary:<first 80 chars> }. Allows
aggregation by tool (Bash/Edit/Read) + cause prefix (ENOENT/timeout/
'String to replace not found').

Required updating existing 'emits error events for tool_result with
is_error' test assertion (old shape had bare 'message' field).

4 new vitest tests + 1 existing relaxed, 286/286 GREEN.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 13:47:37 +03:00
Дмитрий 0663479bb8 feat(observer): heuristic reasoning capture in primary_rationale
Closes brain-retro 2026-05-20 #6 — extractTriggers/Candidates/Boundaries
scan assistant.text for Pravila §N / ADR-N / PSR_v1 RX / routing-off-phase
LN / hard-floor + numbered/bulleted lists (≥2). Populates previously-
always-empty primary_rationale arrays.

Conservative-broad: false positives accepted (mention ≠ application);
/brain-retro determines applied validity. Phase 2 agent-judge out of scope.

19 new tests, 282/282 GREEN.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 13:47:37 +03:00
Дмитрий 52728dfc12 feat(observer): capture ask_user_question events with answer_kind classification (Task 4)
Add extractAskUserQuestionEvents() — for each AskUserQuestion toolUseResult emits
one event per question with answer_kind: option|custom|no_answer and question_count.
Integrated into parseTranscript events pipeline. 7 new tests (263 total, 0 failed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 13:47:36 +03:00
Дмитрий 8e5eaecf6a feat(observer): Task 2 — extractTokenUsage + task_cost in parseTranscript
- export extractTokenUsage(turn): sums input/output/cache/iterations/
  web_search/web_fetch across all assistant messages in a turn
- parseTranscript now includes task_cost field (zero-filled when no usage)
- 7 new tests (5 unit + 2 integration); total 248/248 GREEN
- V2_FIELDS in observer-stop-hook.mjs NOT changed (backward compat)
2026-05-20 13:47:35 +03:00
Дмитрий 47c03a9e18 feat(observer): extend classifyTask with 7 new classes
Closes brain-retro 2026-05-20 #1 — analysis/memory-sync/regulatory-bump/
release/cleanup/monitoring/planning. Addresses '59% other' observation
from initial retro factor matrix.

Ordering: release before feature (merge feature-branch), planning before
refactor (план рефакторинга), memory-sync/regulatory-bump at top as most
specific. monitoring regex проверь состоян covers inflected forms.

9 new vitest tests, 241/241 GREEN in npm run test:tools.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 13:47:34 +03:00
Дмитрий c386361881 fix(observer): infer blocked from unrecovered_error tail, not raw error/retry count (A-1)
Bug: inferOutcome flagged `blocked` whenever errorCount > retryCount across
the turn's events. But the parser emits an `error` event for ANY tool_result
with is_error=true — including expected failures: TDD failing-test-first,
grep returning nothing, git commands with intentional non-zero exit. On
TDD-heavy turns (project's standard discipline) this systematically marked
turns as blocked even when they ended on a successful tool_use.

Fix:
- Parser (extractProcessEvents): walk turn from end, find the LAST
  tool_result; if its is_error=true, emit a single `unrecovered_error`
  event. Distinguishes "turn ended on failure" from "errors recovered
  later". The original per-is_error `error` events remain (useful as raw
  factor signals).
- Analyzer (inferOutcome): replace `errorCount > retryCount → blocked`
  with `events.some(kind === 'unrecovered_error') → blocked`. Same
  ordering preserved (interrupt > blocked > rework/success/unknown).

Tests:
- Parser: emits unrecovered_error when last tool_result is_error;
  does NOT emit when turn ended on a successful tool_result;
  does NOT emit for turns with no tool_results.
- Analyzer: blocked iff unrecovered_error event present (not raw count);
  events=[error, error, retry] → success (no unrecovered_error).

142/142 vitest green (was 128).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 11:03:15 +03:00
Дмитрий 94f831f7d1 fix(observer): uuid-dedup in parseLines (C-1 root fix for quirk #101)
Bug: Claude Code's transcript JSONL file accumulates duplicated context-
rebuild snapshots — the same entry re-printed with the SAME `uuid`. Without
dedup, session_turn / task_size / events double-count, and session_turn
becomes non-monotonic across episodes parsed at different file-growth
states. Live evidence: episodes-2026-05.jsonl lines 14/15/16 of the same
session showed session_turn 139 → 140 → 91 (backwards in time). Probe
on transcript 553717ec: 22400 entries, only 6074 unique uuid (68% dup
rate); real user prompts 264 total vs 92 unique-uuid.

Fix: parseLines now tracks a `seenUuid` Set and skips entries whose uuid
has already been encountered (keep-first). Entries without `uuid`
(synthetic test fixtures) pass through unchanged. All downstream functions
(findTurnStart, extractEnvironment, extractTaskSize, etc.) operate on the
deduped entries array, so the fix is single-point and total.

Tests: new `parseTranscript — uuid-dedup` describe block covers
(1) duplicated-uuid prompts collapse → session_turn counts once,
(2) distinct-uuid entries preserved (no over-dedup),
(3) no-uuid entries pass through (synthetic-fixture safety),
(4) duplicated-uuid assistant turns → tool_calls / files_touched counted once.
110/110 parser tests green (was 106).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 11:00:50 +03:00
Дмитрий 030bdc65ab fix(observer): narrow parallel_session detector to tool_result evidence (C-2)
extractEnvironment was scanning JSON.stringify(turn) for collision markers
(чужой staged / foreign git index / index.lock / another git process). Prose
mentions in user/assistant text flipped parallel_session=true. Live FP proven
on episodes-2026-05.jsonl line 20: my own analysis turn was non-parallel but
recorded parallel_session: true because the finding text mentioned the markers.

Fix: collectToolResultText(turn) — gather text only from tool_result blocks
(both string content and structured `[{type:text,text}]` arrays). Scan THAT
for collision markers; prose is no longer a signal.

Tests: rewrote `parallel_session narrowed` block — false on user/assistant
prose / no-tool-result turns; true on tool_result strings + structured form.
106/106 parser tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 10:58:37 +03:00
Дмитрий 97388cf840 fix(observer): transcript-parser accuracy — session_turn + correction signal
P0.2: count session_turn from the last compaction. The transcript file
accumulates duplicated context-rebuild snapshots (quirk #101), so counting
real prompts from i=0 inflated it and made it non-monotonic. Now counts
"real prompts since the last compaction" — monotonic by construction.

P0.1a: widen the correction prompt_signal regex (не работает / сломал /
опять / откати / revert / still not / wrong / ...). The old regex was too
narrow, so rework outcomes were invisible to the factor analysis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:40:29 +03:00
Дмитрий b2b9a75731 feat(observer): AskUserQuestion in-turn choice + parallel_session narrowing
#1 — detectAskUserQuestionChoice: when a turn contains an AskUserQuestion
whose answer exactly matches an offered option label, classify as
user_chose_from_options. The answered entry carries a structured
toolUseResult (questions[].options[].label + answers map). A custom
"Other" free-text answer is NOT a pick — falls through. Wired into
parseTranscript after the text-list detector.

#3 — parallel_session: dropped broad word matches (параллельн /
"parallel session") that false-fired on any casual mention. Now only
strong collision evidence (foreign git index / чужой staged /
index.lock / another git process). Best-effort per spec R2 — prefer
false-negative over false-positive.

169/169 tools tests GREEN (+9 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 13:39:09 +03:00
Дмитрий 8550ba243d fix(observer): exclude synthetic user-role messages from turn detection
Root cause (systematic-debugging): isRealUserPrompt treated skill-content
("Base directory for this skill:"), local-command output
(<local-command-stdout>), and interrupt markers as genuine prompts.
findTurnStart then anchored a turn on the synthetic message — the turn
slice missed the genuine prompt's UserPromptSubmit hook_additional_context
attachment → economy_level: null, wrong prompt_signal/task_classification.
Same cause made extractLastUserPromptText return skill content, so the
Stop-hook routing-gate false-positive-blocked autonomous §12 skill
invocations (detectMethodDirected saw the node name in skill text).

Fix: SYNTHETIC_PROMPT_MARKERS + isSyntheticPrompt — isRealUserPrompt
returns false for synthetic messages. One fix closes both the
economy_level capture gap and the 2nd routing-gate FP class.

160/160 tools tests GREEN (+3 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 13:39:06 +03:00
Дмитрий 0e3938f845 feat(observer): parser integration — user_chose_from_options before routing-tag
detectChoiceProvenance runs BEFORE parseRoutingTag; if last assistant
turn offered options and user prompt references one, decision_provenance
becomes user_chose_from_options. Otherwise falls back to existing
routing-tag / autonomous logic.

3 new parser tests GREEN; all existing tests still GREEN (43/43).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:04:25 +03:00
Дмитрий 375c3e2d1f feat(observer): parser v2 — process events, routing-tag, episode assembly 2026-05-19 10:23:08 +03:00
Дмитрий 85a95aa2d0 feat(observer): parser v2 — environment, task_size, prompt_signal extractors 2026-05-19 10:15:17 +03:00
Дмитрий 99c7bac99b feat(brain): observer captures real session data via transcript parse
The Stop-hook was writing empty-shell episodes (task_id "unknown-<ts>",
node_chosen "unknown", events []). Root cause: buildEpisodeFromContext
read fields from the Stop-event stdin that Claude Code never sends
(primary_rationale, node_chosen, ...) and the session field name was
wrong (ctx.sessionId camelCase vs Claude Code's session_id). The hook
never read transcript_path — the only real source of session data.

New tools/observer-transcript-parser.mjs — pure parseTranscript(text,
fallbackSessionId):
- Scopes to the last turn (from the last real user prompt to EOF) —
  one episode == one prompt→response cycle. A tool_result-carrier user
  message is not treated as a turn boundary.
- Extracts task_id (real sessionId), timestamps (real duration),
  skill_invoked events, a tool_summary event with per-tool counts,
  error events (tool_result is_error), node_chosen (first skill, else
  "direct"), hard_floor (invoked when a superpowers:* skill is used),
  path_type (regulated/improvised), task_classification (keyword
  heuristic on the prompt).
- Reasoning fields triggers_matched/candidates_considered/
  boundaries_applied stay [] — not recoverable from a transcript;
  their capture is a separate ADR-011 follow-up.

observer-stop-hook.mjs: reads ctx.transcript_path + ctx.session_id
(camelCase fallback kept), readFileSync best-effort, delegates to
parseTranscript. No transcript → graceful fallback to ctx defaults.
Episode schema (5 mandatory + 7-field primary_rationale) unchanged —
no normative change. Stop-event is never blocked (exit 0 on any error).

TDD: 17 parseTranscript tests + 1 buildEpisodeFromContext transcript
test. Full tools Vitest 70/70 GREEN. CLI smoke against a real 575-entry
transcript: episode populated — real task_id, ~6.5 min duration,
tool_summary {Bash:5,Read:5,Grep:1,Edit:9,Write:1}, error event.

Refs: ADR-011 brain governance §6.2 (observer evidence loop).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 08:11:10 +03:00