Дмитрий
9704c539b4
docs(observer): brain-retro #10 + self-retrospect #2 notes from 28.05
...
Brain-retro #10 (10:47 МСК → ~16:30 МСК period, 27 episodes after retro #9 ):
- All 11 mandatory cuts including chain-hook effectiveness
- Batch reviewer pass on 27 episodes (~$2 Opus 4.7)
- Found 4 rework cases, all on ambiguous short prompts
- 4 candidates for owner review (self-retrospect counter quirk,
enforce-clarify-short-prompts hook, cost-aggregator reviewer
cost gap, factor-matrix low-signal marker)
Self-retrospect #2 (evening, after retro #10 ):
- 67 episodes since previous self-retrospect (~07:30 UTC)
- 88 override events in 6 hours (recovery 31, без скилов 57)
- 5 commitments from morning self-retrospect: 2 of 5 broken
- Conclusion: habits without enforcement do not hold
- 3 hook proposals documented for future work
Sanity-check answers persisted for retro #10 audit trail.
cspell-words.txt += триггернулась / triggerов / флагнутые /
ambig / deplo / обнулился / Ревьюер (Russian/English mixed
project terminology from observer notes).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-29 06:50:19 +03:00
Дмитрий
4d7e9e338b
docs(session 2026-05-28): brain-retro 8/9, self-retrospect, sanity, Phase 1-3 plans
...
Groups documentation produced during 2026-05-28 brain-retro session:
retro notes 8 (carryover) and 9, self-retrospect 1, sanity check JSON,
three Phase plans for router-hooks fixes. All implementation already
pushed in earlier commits — this commit groups artifact metadata.
Plus typo fixes in self-retrospect (agregatov, seryj) and cspell vocab
extensions for session-specific terms (PAMYATKA / procs / russian verbs).
Pure documentation. No code, no normative drift.
2026-05-28 12:26:05 +03:00
Дмитрий
81cbd8c1c2
feat(brain-retro #7 ): C1+C2+C3+C4 router-discipline fixes
...
retro #7 (docs/observer/notes/2026-05-27-brain-retro-7.md) surfaced 4
candidates against 23 turns since retro #6 . All four implemented TDD.
C1 — translit slang vocabulary in router-classifier-regex-fallback.mjs.
TASK_TYPE_KEYWORDS += deploy bucket (push / запушь / выкат);
memory-sync += обнови мозг / эталон / пилот / memory dump.
C2 — short_ambiguous_block in router-tool-gate.mjs + router-prehook.mjs.
prehook persists prompt_length; gate blocks Edit/Write/MultiEdit/Bash
when task_type in {ambiguous, unknown} AND prompt_length <= 30 AND
skill not invoked AND no direct_justified tag.
C3 — self-assessment timeout 30s to 50s in observer-self-assessment-api.mjs.
Windows TLS handshake + Sonnet latency exceeded 30s. Stop-hook has 60s
budget; 50s leaves headroom. DEFAULT_TIMEOUT_MS exported for tests.
C4 — Reviewer findings block in status-md-generator.mjs. New helper
computeReviewerFindingsBlock surfaces 51 actionable findings without
running /brain-retro. Detects batch-reviewed via
outcome_reviewed_source=direct_api_batch. MD012 guard test added.
C5 (gitleaks-before-push) intentionally skipped — pre-push hook already
blocks at server side.
Tests: 956/956 root tools, 0 regressions. LEFTHOOK=0 used per quirk #111 .
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-27 06:46:55 +03:00
Дмитрий
8f9ffc387d
chore(observer): brain-retro #6 — full reviewer pass (316/316), digital analysis
...
Period 2026-05-24T00:00Z..2026-05-26T13:18Z (~61h, 317 episodes).
Processed 132 unreviewed episodes via brain-retro-batch-reviewer.mjs
(Opus 4.7 / ProxyAPI, 293.6s, 0 errors). Coverage 100% (316/316), up from
91% in retro #5 .
Findings:
- rework 10.4% (33/316), stable vs retro #5 (11.4%)
- 132 episodes (41.6%) with gap «recommended, picked direct» — but
60-70% turned out to be silent regex-fallback false-positives (fixed
in follow-up commit).
- rework by group: skill_used 12.0% | direct_no_rec 2.5% |
direct_ignored_rec 22.7% — delta 20.2 п.п.
- user_chose_from_options: 0% rework / 0% blocked on 55 episodes —
brainstorm-pattern is the strongest quality mechanism.
- 85% episodes без self_assessment — owner подтвердил «бежал слишком
быстро без остановки» (material signal).
Artefacts:
- docs/observer/notes/2026-05-26-brain-retro-6.md (25KB)
- docs/observer/sanity-checks/2026-05-26-brain-retro-6.json
- STATUS.md regen (C5 488 episodes, missed_activations=21)
- read-counter + self-retrospect-counter bumped (519 since last)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-26 17:28:26 +03:00
Дмитрий
659f2b0757
feat(brain-retro): retro #5 — first reviewer pass (184/202) + batch-reviewer tool
...
Brain-retro #5 за период 2026-05-24T13:18Z .. 2026-05-26T05:09Z (202 эпизода).
Первый ненулевой reviewer-pass в истории brain-governance (раньше 0/414).
Key findings:
• 184 episodes reviewed via Opus 4.7 ProxyAPI, 18 errors (~$9 cost)
• outcome_reviewed: success 24.5% / soft_success 64.1% / rework 11.4%
• node_quality: correct 30% / disputable 59% / wrong_node 9% / over+under 1.6%
• 93.5% no_self_assessment — confirms self-assessment bug fixed in 752d80af
• Top ignored nodes (wrong_node): #19 Superpowers (5), #18 Pest (3),
#33 claude-md-management (2), #25 Semgrep (2)
• Discipline regressed in long session: regulated 19% → 4.5%
Artifacts:
• tools/brain-retro-batch-reviewer.mjs (new) — direct API batch driver
for retros >50 episodes (canonical Task() spawn impractical at scale).
• docs/observer/notes/2026-05-26-brain-retro.md (new) — full retro note
with 4 candidates A/B/C/D for owner review.
• docs/observer/sanity-checks/2026-05-26.json (new) — sanity Q&A.
• docs/observer/episodes-2026-05.jsonl — 184 episodes mutated with
review.* / outcome_reviewed / outcome_reviewed_source fields.
• docs/observer/STATUS.md — refreshed.
• docs/observer/.pii-counters.json / .read-counter.json / .self-retrospect-counter.json
— bumped by procedure.
Spec: brain-retro skill .claude/skills/brain-retro/SKILL.md.
2026-05-26 10:49:28 +03:00
Дмитрий
12f88f32c1
feat(brain): sanity-generator + brain-retro v2 + self-retrospect stub (phase 3 task 19)
...
Phase 3 Task 19 partial — coverage announcement §4.9 deferred to a
separate commit (touches Pravila §17, requires §15.2 pre-flight sync).
- tools/brain-retro-sanity-generator.mjs (NEW, pure):
generateCandidateQuestions(episodes) returns ≤5 sanity questions
derived from per-classification volume (>10 episodes per task type
triggers a themed question: bugfix/feature/planning/refactor/security/
marketing) plus 2 meta questions about missed activations / direct
bypass. Reads task_type from classifier_output (v4) with fallback
to primary_rationale.task_classification (v2/v3). Spec §4.7.
- tools/brain-retro-sanity-generator.test.mjs (NEW): 6 tests
(bugfix >10 / feature >10 / max 5 / empty / legacy v2/v3 / strings).
- .claude/skills/brain-retro/SKILL.md:
+ description rewritten — "раз в 1-2 недели OR sanity-check threshold"
(cadence change per spec §4.7).
+ procedure +steps 5a (sanity questions via AskUserQuestion +
PII filter + sanity-checks/YYYY-MM-DD.json), 5b (reviewer-agent
Task() spawn + fallback to brain-retro-opus-reviewer.mjs), 9
(self-retrospect threshold check), 10 (cost report from
~/.claude/runtime/cost-daily.json), 11 (richer summary).
- .claude/skills/self-retrospect/SKILL.md (NEW) — stub skill;
full procedure wired in Task 20 (analyzer + STATUS.md surface the
threshold).
- docs/observer/.self-retrospect-counter.json (NEW): initial state
{last_run_at: null, episodes_since_last: 0}.
- docs/observer/sanity-checks/.gitkeep (NEW): directory placeholder
for sanity-answers JSON files.
Tests: 608 passed / 0 failed (+15 from Task 19 + prior). 4 pre-existing
file fails unchanged. Coverage announcement §4.9 (economy-mode.py +
Pravila §17 subsection + feedback memory + coverage-annotation-mode
flag) — deferred: touches Pravila which is in the §15.2 8-file SoT
list and needs pre-flight `git fetch origin && git log HEAD..origin/main`
before edit; flagging as Phase 3 follow-up commit.
2026-05-25 14:28:26 +03:00