liderra/portal - portal - Gitea: Git with a cup of tea

liderra/portal

Author	SHA1	Message	Date
Дмитрий	b93e5af439	chore(brain-retro): export CHAIN_OUTCOME_BUCKETS + clean up redundant fs import (Phase 4 #2 review fixes) Code-quality review of Task B (Phase 4) flagged two minor fixes: - Export CHAIN_OUTCOME_BUCKETS for external consumers (test + future cuts) no longer hard-code bucket names. - Replace fs.readFileSync via duplicate `import fs from 'fs'` with the already-imported named `readFileSync` in helpers test. +1 regression test on the export.	2026-05-28 15:48:42 +03:00
Дмитрий	a3f5f392cd	feat(brain-retro): Cut 11 chain-hook effectiveness ledger + analyzer (Phase 4 #2 )	2026-05-28 15:48:39 +03:00
Дмитрий	e58d375648	fix(brain-retro): remove archive-fallback from analyzer Cuts 8/9/10 Stale `docs/archive/llm-bootstrap-2026-05/routing-docs/observer-classification-map.json` was being read inside Cuts 8/9/10 when classificationMap was empty. Source of #37 mermaid noise in retro #9 deploy/monitoring missed-activations. Analyzer now uses nodes.yaml-derived map exclusively (single SoT per ADR-016). Also removed unused `pathResolve` import (was only used in fallback block). Regression test added. Closes brain-retro #9 candidate 3.	2026-05-28 10:44:56 +03:00
Дмитрий	b139888376	feat(brain-retro): extend mandatory digital analysis 7 → 10 cuts SKILL.md MANDATORY DIGITAL ANALYSIS block grows by three cuts: 8. Class × canon coverage (analyzer: buildClassCanonCoverage) 9. Router vs Opus (analyzer: buildRouterVsOpus, sections A / B / C — A and C are mutually exclusive by construction) 10. Chain-ignore breakdown (analyzer: buildChainIgnoreBreakdown, bucketed by chain length 1 / 2 / 3+) All three are wired into analyzer analyze() output as result.classCanonCoverage / result.routerVsOpus / result.chainIgnoreBreakdown and produced automatically on every retro run (no manual step). +216 lines analyzer / +288 lines tests covering the three functions in isolation and via analyze(). Driven by retro #8 manual analysis: the three cuts surface signal the existing 7 cuts missed — router-vs-Opus disagreement, canon coverage by classification, chain-vs-singleton ignore rate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 18:08:53 +03:00
Дмитрий	58784b182d	feat(observer/analyzer): Pass 4 — embedding-NN axis (similar_past_outcome_majority) Closes the 4-pass factor-analysis expansion plan in memory/project_brain_factor_analysis_4passes.md. Adds semantic-search context to the brain-retro analyzer: for each episode, look up its top-3 prompt-embedding neighbours among historical (resolved-outcome) episodes and report the majority outcome family. Lets the matrix answer "do prompts that look like THIS one usually succeed or rework?" # New module: tools/observer-embedding-index.mjs (pure, fs-free) - mapOutcomeToFamily(outcome): success / soft_success → 'success', rework → 'retry', blocked / partial → 'failure', else null. - cosineSimilarity(a, b): generic formula (defends against non- normalised vectors); 0 on null / empty / mismatched lengths. - buildIndex(episodes): keeps only episodes with both a base64 embedding AND a resolved outcome family. Decodes base64 safely (rejects garbage where byteLength % 4 ≠ 0 — Node's Buffer.from('garbage', 'base64') silently strips invalid chars). - findNearestNeighbors(target, index, k, opts): top-k by descending cosine. Supports `excludeKey` (composite task_id\|started_at) and legacy `excludeTaskId`. - majorityOutcome(neighbours): 'mixed' on top-rank tie, 'no_neighbors' on empty input. - episodeKey(ep): the same task_id\|started_at shape that dedupeEpisodes uses — needed because task_id is the SESSION id, shared across turns. task_id alone cannot identify a single turn. # brain-retro-analyzer.mjs - New FACTOR_FNS axis similar_past_outcome_majority reading the pre-computed episode._similarPastOutcomeMajority field. - analyze() builds a single global embedding index from normal (post-inferOutcome), then for every episode decodes its own embedding, looks up top-3 neighbours excluding self by composite key, and stamps the majority family on the episode (O(N^2), fine up to ~10k episodes; HNSW migration deferred per memory plan). - Local decodeTargetEmbedding mirrors the embedding-index safeDecode. # Tests 20 new tests (RED -> GREEN): - observer-embedding-index.test.mjs (new file, 18 tests): cosineSimilarity (5), mapOutcomeToFamily (4), buildIndex (4), findNearestNeighbors (4 incl. self-exclusion), majorityOutcome (3). - brain-retro-analyzer.test.mjs (2 integration tests): similar_past_outcome_majority lands on factor matrix; no_neighbors bucket when no episode has embeddings. Targeted sweep: 632/632 PASS on the 2 directly-affected suites. Broader tools/ sweep: 7968/7969 PASS. Pre-existing 1 test failure in observer-self-assessment-api.test.mjs:258 (contract change from prior session's readRuntimeFlag fix in 050b349a; out of scope for this commit). 95 pre-existing test-file load failures in worktree copies + ruflo / subagent-prompt-prefix — unrelated. Factor matrix grew 11 -> 19 -> 21 -> 29 -> 30 axes across Pass 1+2+3+4. LEFTHOOK=0 due to quirk #111. Manual gitleaks scan: clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 17:07:23 +03:00
Дмитрий	4010495d19	feat(observer/analyzer): Pass 3 — dynamics fields + 8 axes Adds 3 new fields to the v4 episode (`task_meta` block) and 8 new factor-matrix axes capturing turn dynamics: prompt complexity, time- of-day rhythms, inter-prompt cadence, MCP-tool reach, file-mix shape, skill / subagent invocation density. Builds on Pass 1 (`4f362a9e`) and Pass 2 (`2bf25db7`) per memory/project_brain_factor_analysis_4passes.md. # observer-transcript-parser.mjs New exported helpers (covered by unit tests): - classifyFilePath(path) — 7-bucket path categorizer with priority ordering (test > norm > spec > config > data > src > other). Handles both POSIX and Windows separators, normalises CRLF-tolerant. - extractFileTypeDistribution(files) — counts per bucket, zero-fills missing categories for stable downstream key shape. - extractMcpServers(turn) — unique mcp__<server>__* fingerprints, non-greedy match preserves multi-word server names (e.g. plugin_brand-voice_box, plugin_finance_bigquery). parseTranscript() now attaches a `task_meta` block to every episode: - prompt_length_chars — strlen of first user prompt. - mcp_servers_used — unique MCP fingerprints in the turn. - file_type_distribution — count by classifyFilePath bucket. # brain-retro-analyzer.mjs (8 new FACTOR_FNS axes) - prompt_length_bucket: short (<100) / medium / long / huge / null. - time_of_day_bucket: night (00-05 UTC) / morning / afternoon / evening. - day_of_week: Sun..Sat (UTC). - inter_prompt_gap_bucket: <1m / 1-10m / 10-60m / 60m+ / null. Computed in analyze() as (current.started_at − previous.ended_at) within the same session, then read off `episode._interPromptGapMin` by the axis fn (same pattern as `_inferredOutcome`). - mcp_server_used: any / none. - file_type_main: dominant bucket from file_type_distribution, with 'mixed' on top-bucket ties and 'none' on empty / missing. - skill_invocations_bucket: 0 / 1 / 2+ (Skill tool_summary count). - subagent_spawns_bucket: 0 / 1 / 2+ (Agent or Task tool_summary count). `time_of_day_bucket` / `day_of_week` reject null / empty timestamps explicitly — `new Date(null)` would coerce to the epoch and falsely bucket as 'night' / 'Thu'. # Tests 24 new tests (RED → GREEN): - observer-transcript-parser.test.mjs: 13 tests covering classifyFilePath (6 bucket smokes), extractFileTypeDistribution (2), extractMcpServers (2), parseTranscript task_meta block (2 — populated + empty-transcript defaults). - brain-retro-analyzer.test.mjs: 9 tests for each new axis + a smoke verifying all 8 axes land via analyze() on minimal v2. Targeted sweep: 3708 tests pass across 65 affected suites (2 worktree- CRLF copies pre-existing failures, unrelated). Factor matrix grew 11 → 19 → 21 → 29 axes across Pass 1+2+3. Older episodes without task_meta surface as 'null' / 'none' buckets — no throws, no schema_minor bump needed (task_meta is purely additive). LEFTHOOK=0 due to quirk #111. Manual gitleaks scan: clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:50:04 +03:00
Дмитрий	2bf25db72e	feat(observer/analyzer): Pass 2 — classifier metrics + 2 factor axes Surfaces 4 new fields from the Sonnet classifier path into the v4 episode and exposes 2 new factor-matrix axes. Builds on Pass 1 (`4f362a9e`) per memory/project_brain_factor_analysis_4passes.md. # router-classifier.mjs - callAnthropicAPI: new optional onMetrics({ latency_ms, retry_count_internal }) callback, mirroring onUsage. Emits via try/finally so metrics reach the caller on success, fatal 4xx throw, and exhausted-retry throw equally. retry_count_internal is the final attempt index (0 = first-try success, 2 = succeeded after two 5xx retries, etc). - classify(): captures metrics + categorizes LLM transport errors via new classifyLLMError(err) (http_4xx / http_5xx / econnreset / timeout / other). Attaches latency_ms / retry_count_internal / llm_error_type to the result on all 4 paths: LLM ok, transport error → regex fallback, no-key → regex fallback (llm_error_type 'no_key'), parse-null → regex fallback (llm_error_type 'parse_null'). - Default inner llmCall now accepts { onMetrics } so the prod path threads metrics through callAnthropicAPI; test mocks receive the same shape. # observer-state-enricher.mjs (extractClassifierOutput) - +latency_ms, +retry_count_internal, +llm_error (categorized), +alternatives_considered (capped at top-3 to bound JSONL line size — Sonnet sometimes returns 5+). - All four fields null-safe on regex / prefilter / cache paths. # brain-retro-analyzer.mjs (FACTOR_FNS) - latency_bucket: fast (<500ms) / medium / slow / very_slow / null. - error_type: classifier_output.llm_error verbatim with null default. # Tests 15 new tests (all RED first, then GREEN): - router-classifier.test.mjs: 3 callAnthropicAPI metric tests + 7 classify() metric-surface tests covering all 4 paths and 4 error categories. - observer-state-enricher.test.mjs: 4 extractClassifierOutput metric/alternatives tests (presence, top-3 cap, null on non-LLM, degraded path). - brain-retro-analyzer.test.mjs: 2 axis-presence tests. Full sweep 789/789 GREEN (pre-existing worktree-copy CRLF failure unrelated). Existing 3 callAnthropicAPI contract tests preserved (onMetrics optional; behavior unchanged when callback absent). LEFTHOOK=0 due to quirk #111. Manual gitleaks scan: clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:32:30 +03:00
Дмитрий	4f362a9e62	feat(observer/analyzer): Pass 1 — 8 cheap factor axes Adds 8 new axes to FACTOR_FNS that derive from data already present in v4 episodes (no parser/episode-writer changes). Cheapest of the 4-pass factor analysis expansion plan in memory/project_brain_factor_analysis_4passes.md. New axes (string-key buckets, null-safe on missing/legacy fields): - prompt_signal: raw value (new_task / continuation / correction / approval / neutral / null) - classifier_source: classifier_output.source verbatim (llm / regex / prefilter / prefilter_inherited / cache / null) - degraded_mode: true / false - path_type: regulated / improvised / null - retry_count: 0 / 1-2 / 3+ (count events[].kind=retry) - error_count: 0 / 1 / 2+ (count events[].kind=error) - hard_floor_invoked: true / false (primary_rationale.hard_floor.invoked) - iterations_bucket: 0 / 1-3 / 4-10 / 11+ (task_cost.iterations) Together with the 11 existing axes, the factor matrix now covers 19 discrete dimensions. Older v2 episodes without these fields surface as 'null' / 'false' / '0' buckets — no throws, no skipped rows. TDD: 9 tests added in brain-retro-analyzer.test.mjs (one per axis + a smoke that all 8 land on the matrix via analyze() on a minimal v2 episode). Full suite 599/599 GREEN. LEFTHOOK=0 due to known quirk #111 (gitleaks pre-commit hangs on heavy package-lock.json diff in workspace). Manual gitleaks scan: clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:23:31 +03:00
Дмитрий	cf97898833	feat(brain): analyzer v4 aggregations + schema_minor 2→3 + phase-3 flags (phase 3 task 20) Phase 3 Task 20 — analyzer surfaces v4 review distribution / inheritance / cost totals / degraded count. Schema_minor bumps 2→3. Final phase-3 runtime flags flipped. - tools/brain-retro-analyzer.mjs: + inheritanceCount: count of episodes with inheritance.inherited_from_task_id. + reviewQuality: distribution of review.node_quality across {correct, wrong_node, overkill, underkill, disputable}. + reviewerCoverage: {reviewed, pending, errored} — episodes reviewed by subagent / awaiting review / escalated with reviewer_error. + degradedCount: episodes where LLM classifier fell back to regex. + costTotals: sum of classifier/self_assessment/reviewer input/output tokens across the period (six counters). All additions are read-only over the existing dedup'd normal episode list — no new pass. - tools/brain-retro-analyzer.test.mjs: +6 tests (inheritance count / reviewQuality distribution / pending / errored / degraded / cost sums). - tools/observer-stop-hook.mjs: buildEpisode schema_minor 2→3 bump. - tools/observer-stop-hook.test.mjs: 1 schema_minor assertion 2→3. Runtime flags flipped (user-level, not git): reviewer-mode = subagent self-retrospect-mode = on sanity-check-mode = mandatory All 9 phase-2 + phase-3 flags now present: router-classifier-mode=llm-first \| prompt-enrichment-mode=on \| inheritance-mode=on \| embedding-mode=on \| router-gate-mode=warn-only \| self-assessment-mode=on \| reviewer-mode=subagent \| self-retrospect-mode=on \| sanity-check-mode=mandatory. Tests: 614 passed / 0 failed. 4 pre-existing empty test files unchanged. NB: schema v4.3 parser extension (prompt_embedding_base64 + outcome_reviewed + extended task_cost in parser write block per spec §5) NOT touched in this commit — that wiring belongs to the parse-time path which Task 17 also did not modify (only buildEpisode in stop-hook bumps the minor). Both are tracked for Phase 3 follow-up alongside §4.9 coverage announcement and status-md cost section.	2026-05-25 14:28:26 +03:00
Дмитрий	bec69aa565	fix(brain): derive routerStep from observable signals (was hardcoded constant) Root cause: primary_rationale.step было жёстко прописано как литерал `1` в обоих episode-builder'ах (observer-transcript-parser.mjs:813, observer-stop-hook.mjs:153). Поэтому routerStepReached видел { '1': N } и suspicious=true для ВСЕХ данных — показатель измерял константу, а не дисциплину роутера. Фикс: новая чистая функция deriveRouterStep(primary_rationale) — берёт максимум наблюдаемой стадии router-procedure.md из реальных признаков (task_classification ≠ 'other' → 2; triggers_matched → 3; chain_ref → 4; node_chosen ≠ 'direct' → 5). routerStepReached теперь вызывает её при чтении, игнорируя хранимое pr.step. Это делает метрику честной для ВСЕХ существующих эпизодов (включая исторические 136 за май) — без миграции данных. Boost для baseline'а CHECKPOINT B этапа 3: на боевых данных (131 schema-v2+ эпизод) distribution теперь = { 1: 55, 2: 46, 3: 12, 5: 18 }, suspicious=false. Видно реальную картину: ~42% эпизодов остановились на hard-floor, только ~14% реально дошли до исполнения навыка. Follow-up: episode-builder'ы продолжают писать step:1 (теперь это безвредно — метрика игнорирует). Отдельно можно прибрать запись в builder'ах для self-describing эпизодов. Test changes: - tools/discipline-metrics.test.mjs: +describe('deriveRouterStep') (9 cases), routerStepReached describe переписан под сигналы-источник. - tools/brain-retro-analyzer.test.mjs: 'returns routerStepReached distribution' обновлён — эпизоды конструируются с сигналами (triggers vs bare), не хранимым step. Full tools/ vitest run: 520/520 GREEN. 4 pre-existing empty test files (ruflo-*, subagent-prompt-prefix) — не моя регрессия. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-24 13:25:05 +03:00
Дмитрий	7ac18d1103	feat(brain): analyze() returns 3 discipline slices + CLI reads registry Stage 2 Task 4 -- analyze() расширен: disciplineByClassification, routerStep, boundariesRate. CLI (tools/brain-retro-analyzer.mjs source-of-truth) теперь читает classificationMap и dormancy из docs/registry/nodes.yaml через registry-to-classification-map.mjs (вместо observer-classification-map.json и .node-dormancy.json). Sanity-check na 124 эпизодах: missed_before=17 -> missed_after=17 (delta=0). disciplineKeys: bugfix, feature, refactor, planning, cleanup, monitoring, analysis. step dist: all step=1 (suspicious=true -- expected baseline). boundaries rate: 0.105. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 06:56:37 +03:00
Дмитрий	6a9df652ff	feat(observer): analyzer >=2 + recommended_node_for_direct factor axis brain-retro-analyzer accepts schema_version >= 2 (v2+v3 mix). FACTOR_FNS +recommended_node_for_direct ('none' bucket for v2). missed-activations also raised to >= 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 13:38:54 +03:00
Дмитрий	6174830311	feat(observer): wire missed-activation matcher into analyze()	2026-05-21 09:59:56 +03:00
Дмитрий	4c9a1e9ccb	feat(brain-retro): aggregate chain_ref into factorMatrix (multi-chain axis)	2026-05-21 06:06:27 +03:00
Дмитрий	492a4fc969	feat(observer): inferOutcome neutral next-prompt → soft_success Closes brain-retro 2026-05-20 #16 — when the next prompt is 'neutral' (no correction/approval/new_task markers), interpret as silent success ('no objection') and surface as soft_success. Slightly weaker than explicit approval — labelled separately so brain-retro can show breakdown. 4 new vitest tests, 324/324 GREEN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 13:47:43 +03:00
Дмитрий	a007295abe	refactor(observer): rename factor axis session_turn → session_segment_turn Closes brain-retro 2026-05-20 #14 — `environment.session_turn` уже значит 'turns since last compaction' (parser counts from lastCompactIdx + 1). Ось матрицы под именем 'session_turn' путала с глобальным turn-номером. Семантика данных не меняется, только имя axis в FACTOR_FNS. Existing test renamed; new explicit test verifies new name present and legacy name absent. 1 new vitest test + 1 renamed, 320/320 GREEN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 13:47:41 +03:00
Дмитрий	7fe9f89574	fix(observer): exclude hot/normative files from causal chains (A-3) Bug: findCausalChains flagged a chain whenever two episodes shared any file. CLAUDE.md / MEMORY.md / STATUS.md / episodes-YYYY-MM.jsonl / memory/.md are touched by almost every turn (memory store, status regeneration, normative-doc updates) — sharing them is not evidence of causality, just baseline noise. Result: spurious chains on hot files crowded out the genuine signal. Fix: HOT_FILE_PATTERNS regex list + `isHotFile(path)` predicate. In findCausalChains, filter hot files out of BOTH the errored-episode file set AND the candidate-shared list. If only hot files were shared → no chain. If a non-hot file is also shared → the chain stands and the sharedFiles list contains only the non-hot ones. Tests: 4 new cases — CLAUDE.md / memory/.md / episodes/STATUS/MEMORY sharing yields no chain; a turn sharing both CLAUDE.md AND /src/app.ts yields a chain with sharedFiles=['/src/app.ts'] only. 33/33 analyzer tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 11:04:59 +03:00
Дмитрий	c386361881	fix(observer): infer blocked from unrecovered_error tail, not raw error/retry count (A-1) Bug: inferOutcome flagged `blocked` whenever errorCount > retryCount across the turn's events. But the parser emits an `error` event for ANY tool_result with is_error=true — including expected failures: TDD failing-test-first, grep returning nothing, git commands with intentional non-zero exit. On TDD-heavy turns (project's standard discipline) this systematically marked turns as blocked even when they ended on a successful tool_use. Fix: - Parser (extractProcessEvents): walk turn from end, find the LAST tool_result; if its is_error=true, emit a single `unrecovered_error` event. Distinguishes "turn ended on failure" from "errors recovered later". The original per-is_error `error` events remain (useful as raw factor signals). - Analyzer (inferOutcome): replace `errorCount > retryCount → blocked` with `events.some(kind === 'unrecovered_error') → blocked`. Same ordering preserved (interrupt > blocked > rework/success/unknown). Tests: - Parser: emits unrecovered_error when last tool_result is_error; does NOT emit when turn ended on a successful tool_result; does NOT emit for turns with no tool_results. - Analyzer: blocked iff unrecovered_error event present (not raw count); events=[error, error, retry] → success (no unrecovered_error). 142/142 vitest green (was 128). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 11:03:15 +03:00
Дмитрий	353b1599b6	fix(observer): brain-retro analyzer — blocked outcome + v1 filter + factors P0.1b: inferOutcome emits 'blocked' when a turn had more error than retry events (an unrecovered tool failure) — previously the enum value was dead. P0.1c: 'failure' documented as deferred to the phase-2 agent-judge. It is a judgment (work wrong AND never corrected), not deterministically recoverable from a transcript; a wrong-then-corrected turn surfaces as 'rework'. P1.1: analyze() drops v1 episodes (no schema_version 2) — they lack environment/prompt_signal/decision_provenance and polluted the factor matrix. Reports v1SkippedCount. P2.1: session_turn (bucketed early/mid/late) and parallel_session added to FACTOR_FNS — closes the schema↔matrix mismatch (both were captured in the episode but absent from the factor axes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 17:40:44 +03:00
Дмитрий	dc6d2dd358	test(brain-retro): regression guard — 3rd provenance kind in factor matrix buildFactorMatrix already buckets decision_provenance.kind dynamically (brain-retro-analyzer.mjs:112) — no production change needed. Test pins that user_chose_from_options is counted on the provenance axis. 12/12 brain-retro tests GREEN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 12:06:56 +03:00
Дмитрий	a6f44e5bb4	feat(observer): brain-retro analyzer — outcome inference + factor matrix Pure deterministic Layer-4 aggregation module (spec §6) for the /brain-retro skill. Exports: dedupeEpisodes, inferOutcome, groupEpisodesToTasks, findCausalChains, buildFactorMatrix, analyze. Read-only — never writes JSONL. 11/11 tests green. CLI smoke: 10 real episodes → valid JSON with all 5 keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 10:47:57 +03:00