liderra/portal - portal - Gitea: Git with a cup of tea

liderra/portal

Author	SHA1	Message	Date
Дмитрий	eedc700bb7	test(classifier): regression guards for 8-pattern PAMYATKA (Phase 3 close) Three regression tests: 1. Header count reflects 8 patterns 2. All 8 patterns present in strict ascending order (1-8) 3. Original 4 patterns (brainstorming/discovery/plans/debugging) preserved verbatim — protects existing accuracy baseline from drift on future pamyatka edits. Closes Phase 3 brain-retro #9 candidates 7/1/8/10.	2026-05-28 12:13:54 +03:00
Дмитрий	ee32317bf4	feat(classifier): PAMYATKA PATTERN 8 — mechanical work → coder-agent #19 (Phase 3 #10 ) Closes brain-retro #9 candidate 10 + self-retrospect 28.05: 16 reviewer- Opus marks of "should have delegated to coder-agent". Controller (Opus) was doing repetitive mechanical work itself, burning big-context budget on tasks suited for fresh subagent. PATTERN 8 trains classifier to recognize mechanical/repetitive signals (N odnotipnyh, massovaya pravka, po shablonu) and recommend coder-agent #19 via Task tool delegation.	2026-05-28 12:12:39 +03:00
Дмитрий	8bc109c7ef	feat(classifier): PAMYATKA PATTERN 7 — prod errors → Sentry MCP first (Phase 3 #8 ) Closes brain-retro #9 candidate 8: 8 reviewer-Opus marks of "should have used Sentry first". Self-retrospect 28.05: "симптом с боевого → гадать по коду вместо Sentry". PATTERN 7 forces classifier to put Sentry MCP (#34) FIRST in recommended_chain when prompt indicates production-runtime origin (boevoj, klient soobschil, v logah, etc). NB: Sentry MCP is currently pending B-1 deployment per Tooling section 4.8, but pattern is added so classifier produces correct recommendation once instance is live.	2026-05-28 12:10:46 +03:00
Дмитрий	84d0134875	feat(classifier): PAMYATKA PATTERN 6 — bugfix chain with Pest #18 (Phase 3 #1 ) Closes brain-retro #9 candidate 1: classifier recognized bugfix via PATTERN 4 (→ systematic-debugging) but didn't extend to chain with Pest #18 for test-first regression coverage. Real-world driver: adr-judge.py catastrophic backtracking fix (commit `1e1457eb`) — should have gone through TDD via Pest, not direct edit. Reviewer Section A in retro #9 flagged this. PATTERN 6 extends PATTERN 4 with explicit chain recommendation when fix touches live code (regex/parser/hook/race/perf).	2026-05-28 12:09:12 +03:00
Дмитрий	d1b5505a8f	feat(classifier): PAMYATKA PATTERN 5 — feature requests → writing-plans (Phase 3 #7 ) Closes brain-retro #9 candidate 7: classifier was not recognizing «добавь / реализуй / сделай» as feature triggers requiring writing-plans chain (≥3 steps). Self-retrospect 28.05: 0/17 feature tasks invoked writing-plans. Pattern added to PAMYATKA, injected into system prompt when enrichment=true. PATTERN 5 specifically distinguishes: - ≥3-step feature → writing-plans before code - ≤2-step micro-feature → direct ok Header count updated: «4 паттерна» → «8 паттернов».	2026-05-28 12:07:35 +03:00
Дмитрий	81cbd8c1c2	feat(brain-retro #7 ): C1+C2+C3+C4 router-discipline fixes retro #7 (docs/observer/notes/2026-05-27-brain-retro-7.md) surfaced 4 candidates against 23 turns since retro #6. All four implemented TDD. C1 — translit slang vocabulary in router-classifier-regex-fallback.mjs. TASK_TYPE_KEYWORDS += deploy bucket (push / запушь / выкат); memory-sync += обнови мозг / эталон / пилот / memory dump. C2 — short_ambiguous_block in router-tool-gate.mjs + router-prehook.mjs. prehook persists prompt_length; gate blocks Edit/Write/MultiEdit/Bash when task_type in {ambiguous, unknown} AND prompt_length <= 30 AND skill not invoked AND no direct_justified tag. C3 — self-assessment timeout 30s to 50s in observer-self-assessment-api.mjs. Windows TLS handshake + Sonnet latency exceeded 30s. Stop-hook has 60s budget; 50s leaves headroom. DEFAULT_TIMEOUT_MS exported for tests. C4 — Reviewer findings block in status-md-generator.mjs. New helper computeReviewerFindingsBlock surfaces 51 actionable findings without running /brain-retro. Detects batch-reviewed via outcome_reviewed_source=direct_api_batch. MD012 guard test added. C5 (gitleaks-before-push) intentionally skipped — pre-push hook already blocks at server side. Tests: 956/956 root tools, 0 regressions. LEFTHOOK=0 used per quirk #111. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 06:46:55 +03:00
Дмитрий	7b4da1477e	fix(classifier,gate): G parser-quirks + H unknown-not-blocking + A1/A2/B3/C1 Brain-retro #6 follow-up #2 (consolidated). Eight independent fixes: A1 — task_cost wiring (cost tracking) - router-prehook.mjs: capture classifier LLM usage via onUsage callback, persist to state.task_cost.classifier_input_tokens / output_tokens. - observer-transcript-parser.mjs: merge router-state.task_cost on top of extractTokenUsage(turn). State-file values win for classifier/ self_assessment/reviewer fields. - New buildCostFromClassifierUsage() exported from router-prehook. - Verified live: state file now shows real input_tokens=190 / output_tokens=598 / cache_read=10075 (was 0 before). A2 — self-assessment coverage - observer-self-assessment-api.mjs: DEFAULT_TIMEOUT_MS 10s -> 30s. - .claude/settings.json: Stop-hook timeout 15s -> 60s. - Same Windows TLS handshake issue. Was 85% no_self_assessment in retro #6. B3 — brain-retro SKILL.md reconciliation - Step 5b: batch=default for N>=20, subagent for N<20. C1 — dead-code cleanup - Removed recommendNode import + getClassificationMap + getDormancy from observer-transcript-parser.mjs. G — parseClassifierResponse Pass 3 (fixLLMJsonQuirks) - Root cause: real Sonnet output sometimes contains raw newlines inside string values (multi-line reason_for_choice) and trailing commas, which strict JSON.parse rejects. Result was llm_error_type=parse_null on every other call, falling back to regex with task_type=unknown. - Fix: after Pass 1 (clean) and Pass 2 (brace-extract) fail, try Pass 3 that escapes raw newline/tab inside string values and strips trailing commas before final JSON.parse attempt. Pure char-walk, no JSON5 dep. H — 'unknown' added to NON_BLOCKING_TASK_TYPES in router-tool-gate.mjs - Until G fully proves itself, blocking Bash/Edit on unknown is too strict. With G in place, parse_null should be rare; H gives a safety net. Tests added: +9 across 5 test files. Regression: 913 vitest tests in tools/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 19:25:16 +03:00
Дмитрий	91c4ccc674	fix(classifier): hook timeout 10→60s + remove silent recommended_node fallback + mandatory digital analysis in brain-retro skill Three independent fixes from brain-retro #6 root-cause analysis: 1. .claude/settings.json — UserPromptSubmit `router-prehook.mjs` timeout raised 10s→60s. First fetch on Windows triggers TLS handshake which can take 20+ seconds; LLM classifier had perAttemptTimeoutMs=30s with 4 retries but the WRAPPING hook timeout killed the process at 10s before first attempt completed. Result: only 1 of 325 episodes since 24.05 actually classified via Sonnet 4.6 (rest fell to regex fallback or left state-file untouched). 2. tools/observer-transcript-parser.mjs:937-959 — removed `classifMapNode` silent fallback in `primary_rationale.recommended_node`. When router-state file had no recommended_node, the parser was filling it with `recommendNode(classifyTask(prompt), ...)` — a keyword-regex that LOOKED like a classifier signal but wasn't. brain-retro #6 analysis showed 60-70% of «recommended_node» values were just regex false-positives, polluting the «direct_ignored_rec» metric. Now recommended_node is null when no real classifier signal exists. 3. .claude/skills/brain-retro/SKILL.md — added MANDATORY DIGITAL ANALYSIS block at the top of Procedure. Every /brain-retro run MUST emit 7 quantitative tables (path-type, node_chosen, recommended_node, GAP, outcome×group, classifier presence, per-classification discipline). Also forbids jargon in sanity questions (per memory `feedback_plain_language.md`) — owner is non-developer. Tests: - tools/observer-transcript-parser.test.mjs — 2 tests updated to assert recommended_node=null on no-state-file (was '#19'). Confirmed RED → fix → GREEN. - tools/router-classifier.test.mjs — 10 new parametrised tests for project-vocabulary anchors (webhook/queue/migration/RLS/etc). Already GREEN with current ANCHOR_NOUNS — prefilter uses len<15 threshold which doesn't catch typical business prompts. Regression: 899 vitest tests passed (1 file failure pre-existing in .claude/worktrees/supplier-project-failover/ — empty file, unrelated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 17:29:03 +03:00
Дмитрий	2bf25db72e	feat(observer/analyzer): Pass 2 — classifier metrics + 2 factor axes Surfaces 4 new fields from the Sonnet classifier path into the v4 episode and exposes 2 new factor-matrix axes. Builds on Pass 1 (`4f362a9e`) per memory/project_brain_factor_analysis_4passes.md. # router-classifier.mjs - callAnthropicAPI: new optional onMetrics({ latency_ms, retry_count_internal }) callback, mirroring onUsage. Emits via try/finally so metrics reach the caller on success, fatal 4xx throw, and exhausted-retry throw equally. retry_count_internal is the final attempt index (0 = first-try success, 2 = succeeded after two 5xx retries, etc). - classify(): captures metrics + categorizes LLM transport errors via new classifyLLMError(err) (http_4xx / http_5xx / econnreset / timeout / other). Attaches latency_ms / retry_count_internal / llm_error_type to the result on all 4 paths: LLM ok, transport error → regex fallback, no-key → regex fallback (llm_error_type 'no_key'), parse-null → regex fallback (llm_error_type 'parse_null'). - Default inner llmCall now accepts { onMetrics } so the prod path threads metrics through callAnthropicAPI; test mocks receive the same shape. # observer-state-enricher.mjs (extractClassifierOutput) - +latency_ms, +retry_count_internal, +llm_error (categorized), +alternatives_considered (capped at top-3 to bound JSONL line size — Sonnet sometimes returns 5+). - All four fields null-safe on regex / prefilter / cache paths. # brain-retro-analyzer.mjs (FACTOR_FNS) - latency_bucket: fast (<500ms) / medium / slow / very_slow / null. - error_type: classifier_output.llm_error verbatim with null default. # Tests 15 new tests (all RED first, then GREEN): - router-classifier.test.mjs: 3 callAnthropicAPI metric tests + 7 classify() metric-surface tests covering all 4 paths and 4 error categories. - observer-state-enricher.test.mjs: 4 extractClassifierOutput metric/alternatives tests (presence, top-3 cap, null on non-LLM, degraded path). - brain-retro-analyzer.test.mjs: 2 axis-presence tests. Full sweep 789/789 GREEN (pre-existing worktree-copy CRLF failure unrelated). Existing 3 callAnthropicAPI contract tests preserved (onMetrics optional; behavior unchanged when callback absent). LEFTHOOK=0 due to quirk #111. Manual gitleaks scan: clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:32:30 +03:00
Дмитрий	808461295a	feat(router): Sonnet classifier + памятка + regex-fallback module (phase 2 task 10) Phase 2 Task 10 of LLM-first router overhaul. Spec §4.2 — Layer 2 Sonnet 4.6 classifier with 4-pattern памятка enrichment, JSON output per spec, fallback chain Sonnet → regex → degraded. Phase 1 regex Layer 1 extracted to its own module so it can be called only as a fallback. - tools/router-classifier-regex-fallback.mjs (NEW): self-contained regex fallback. Extracts TASK_TYPE_KEYWORDS, HARD_KEYWORD_STEMS, detectTaskType, keywordMatches, detectRecommendedNode, computeConfidence, classifyByRegex verbatim from the prior classifier. Self-contained (own MICRO_KEYWORDS, detectMicro, lower) — no circular imports. - tools/router-classifier.mjs (REWRITE): + import { CLASSIFIER_MODEL } from router-config.mjs + re-export { classifyByRegex } from regex-fallback (back-compat surface) + buildClassifierPrompt(prompt, registry, { enrichment=true }) — spec §4.2 format with 4-pattern памятка (brainstorming / discovery-interview / writing-plans / systematic-debugging) togglable via enrichment flag. + parseClassifierResponse(text) — strict task_type required, ```json fence aware, accepts null recommended_chain_id. + classify() rewritten: prefilter → cache → Sonnet (CLASSIFIER_MODEL) → regex fallback (transport error OR no key/unparseable). + callAnthropicAPI default model = CLASSIFIER_MODEL; max_tokens 300 → 1500 (full classifier output with alternatives & памятка needs the budget). - removed: shouldEscalate, TASK_TYPE_KEYWORDS, detectTaskType, keywordMatches, detectRecommendedNode, HARD_KEYWORD_STEMS, computeConfidence (all live in regex-fallback now). Kept legacy: buildLLMPrompt / parseLLMResponse (back-compat surface). - tools/router-accuracy-runner.mjs: import classifyByRegex from regex-fallback module (G11 from plan). Runner functionality unchanged. - tools/router-classifier.test.mjs: +8 tests for buildClassifierPrompt (4) and parseClassifierResponse (4); removed obsolete shouldEscalate block (3); rewrote classify integration block (4 tests) to reflect new flow (prefilter-first, LLM-always-on-fallthrough, regex on error). Tests: tools/router-classifier.test.mjs 44/44 PASS. Full tools/ suite: 557 tests passed, 0 failed (4 pre-existing empty test files report "no test suite found" — unrelated: ruflo-recall-hook, subagent-prompt-prefix, plus 2 others — not touched in this commit). accuracy-runner smoke: type=85%/node=55%/micro=100% on the 20-prompt set, unchanged from pre-Task-10 baseline (regex path semantics preserved).	2026-05-25 14:28:25 +03:00
Дмитрий	41deac7bc8	feat(router): prefilter 3 groups + manual override + anchor (phase 2 task 9) Phase 2 Task 9 of LLM-first router overhaul. Spec §4.1 — adds prefilter() Layer 1 with 7-check chain: manual override → continuation (inheritance ≤30 min) → acknowledgment → cancellation → short-conversation + anchor → micro → fall-through. - tools/router-classifier.mjs: +export prefilter(prompt, { prevState, registry }). Pure (no fs/exec/net). Imports INHERITANCE_MAX_AGE_MIN from router-config.mjs. Constants: CONTINUATION_PATTERNS (13), ACKNOWLEDGMENT_PATTERNS (10), CANCELLATION_PATTERNS (8), MANUAL_OVERRIDE_RE, ANCHOR_NOUNS (28), ANCHOR_IMPERATIVES (10, fires only when length > 30), SKILL_ALIAS_MAP (well-known superpower aliases for manual override without registry). Existing classifyByRegex / classifyByLLM untouched — Task 10 extracts them to a fallback module. - tools/router-classifier.test.mjs: +8 prefilter tests covering all 7 checks plus content-prompt fall-through. Tests in worktree: 118/118 PASS (8 new prefilter + 110 existing).	2026-05-25 14:28:24 +03:00
Дмитрий	af441961d9	fix(router): LLM Layer 2 через ProxyAPI с отдельным ключом ROUTER_LLM_KEY router-classifier больше не ходит в недоступный api.anthropic.com и не читает ANTHROPIC_API_KEY (это перехватывало основную сессию Claude Code с подписки). callAnthropicAPI теперь ходит в ProxyAPI по умолчанию, ключ берёт из отдельной ROUTER_LLM_KEY, базовый URL — ROUTER_LLM_BASE_URL (опционально). Нет ключа → Layer 2 тихо выключен, откат на regex. +6 тестов (30/30 GREEN). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 06:07:02 +03:00
Дмитрий	b3af39bdbf	feat(router): classifier Layer 2 — Sonnet escalation + cache (stage 3 task 3) buildLLMPrompt сериализует активные узлы + chains в prompt. classify() — гибрид regex + LLM с кэшем per-prompt-hash. callAnthropicAPI через built-in fetch (без SDK). shouldEscalate: confidence<0.7 AND not micro. Fallback на regex-result при ошибке LLM. NB: real-API verification отложена — нет ANTHROPIC_API_KEY на dev-машине; Phase A 'вариант 2': mock-тесты only. Когда ключ появится, код заработает без изменений. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 10:18:22 +03:00
Дмитрий	35877b7df0	feat(router): classifier Layer 1 — pure regex по реестру (stage 3 task 2) classifyByRegex(prompt, registry) → {taskType, micro, recommendedNode, confidence, source}. Read-only, без fs/exec/net. RU+EN keyword'ы для типа задачи + детект micro + матч по keyword/classification триггерам активных узлов реестра. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 10:13:25 +03:00