docs(brain): baseline pre-enforcement snapshot (stage 2 task 6)
Зафиксированы цифры дисциплины роутера на 2026-05-24 перед запуском enforcement-хука этапа 3. Sanity-check passed: missed_before=17 == missed_after=17 (delta=0) после переключения источника правды на реестр. observer-classification-map.json помечен deprecated — для удаления в этапе 4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,96 @@
|
||||
# Baseline дисциплины роутера — pre-enforcement snapshot
|
||||
|
||||
**Дата:** 2026-05-24
|
||||
**Источник данных:** `docs/observer/episodes-2026-05.jsonl`
|
||||
**Этап:** Router discipline overhaul, Stage 2 (Measurements). Зафиксирован для сравнения с пост-enforcement цифрами этапа 3.
|
||||
**Spec:** `docs/superpowers/specs/2026-05-23-router-discipline-overhaul-design.md`
|
||||
**Plan:** `docs/superpowers/plans/2026-05-24-router-overhaul-stage-2-measurements.md`
|
||||
**Commit:** 30b795c
|
||||
|
||||
## Объём данных
|
||||
|
||||
- Эпизодов всего: 129 (124 v2+ + 5 v1)
|
||||
- v2+ эпизодов (анализируется): 124
|
||||
- v1 эпизодов пропущено: 5
|
||||
- Observer-error маркеров: 0
|
||||
|
||||
## Цифры
|
||||
|
||||
### Дисциплина по типам задач
|
||||
|
||||
| Тип задачи | Эпизодов | % с триггер-матчем | % через скил |
|
||||
|---|---|---|---|
|
||||
| bugfix | 6 | 33.3% | 33.3% |
|
||||
| analysis | 4 | 0% | 25.0% |
|
||||
| feature | 5 | 0% | 0% |
|
||||
| planning | 2 | 0% | 0% |
|
||||
| refactor | 1 | 0% | 0% |
|
||||
| cleanup | 1 | 0% | 0% |
|
||||
| monitoring | 1 | 0% | 0% |
|
||||
|
||||
### Распределение по шагам роутера
|
||||
|
||||
- distribution: `{"1": 124}`
|
||||
- total: 124
|
||||
- **suspicious: true** — >90% эпизодов остановились на step=1; sentinel-bug парсера, требует исследования в этапе 3
|
||||
|
||||
### Применение границ (ADR)
|
||||
|
||||
- Total: 124
|
||||
- With boundaries: 13
|
||||
- Rate: 10.5%
|
||||
- By path_type:
|
||||
- `improvised`: 112 эпизодов, 11 с boundaries, 9.8%
|
||||
- `regulated`: 12 эпизодов, 2 с boundaries, 16.7%
|
||||
|
||||
### Missed activations
|
||||
|
||||
- Total: 17
|
||||
|
||||
By classification:
|
||||
|
||||
```json
|
||||
{
|
||||
"bugfix": 4,
|
||||
"feature": 5,
|
||||
"refactor": 1,
|
||||
"planning": 2,
|
||||
"cleanup": 1,
|
||||
"monitoring": 1,
|
||||
"analysis": 3
|
||||
}
|
||||
```
|
||||
|
||||
By node (top 5 по количеству):
|
||||
|
||||
```json
|
||||
{
|
||||
"#19": 12,
|
||||
"#34": 5,
|
||||
"#18": 4,
|
||||
"#41": 2,
|
||||
"#42": 2
|
||||
}
|
||||
```
|
||||
|
||||
## Контекст
|
||||
|
||||
Это «точка До» перед включением enforcement-хука этапа 3. После недели работы хука повторно снимем эти цифры и сравним.
|
||||
|
||||
**Цели overhaul'а (из spec'а §acceptance criteria):**
|
||||
|
||||
- Дисциплина (% эпизодов с матченным триггером на классифицированных задачах): **≥75%** (baseline зафиксирован выше — сейчас 33.3% лишь у bugfix, остальные 0%).
|
||||
- Missed activations: **≤5/неделю** (baseline: 17 за месяц).
|
||||
- % feature/planning без skill: **≤10%** (baseline: feature 0%, planning 0% — обе категории нарушают цель).
|
||||
|
||||
## Заметка о suspicious-флаге
|
||||
|
||||
`suspicious: true` в `routerStep` указывает, что **все 124 v2+ эпизода имеют `step=1`**. Это означает, что парсер `tools/observer-transcript-parser.mjs` пока не enrich'ит фактический шаг роутера — поле `primary_rationale.step` сейчас постоянно `1` (sentinel default). Этот пропуск самой инструментовки наблюдателя — отдельный задел для этапа 3 (нужно либо расширить парсер, чтобы он различал шаги, либо явно вычислять step из контекста). До этого срез по router_step **не информативен**.
|
||||
|
||||
## Воспроизводимость
|
||||
|
||||
```bash
|
||||
node tools/brain-retro-analyzer.mjs docs/observer/episodes-2026-05.jsonl
|
||||
```
|
||||
|
||||
Источник classificationMap + dormancy — `docs/registry/nodes.yaml` (через `tools/registry-to-classification-map.mjs`).
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"$schema_version": 1,
|
||||
"description": "Mapping from observer transcript-parser task_classification values to recommended Tooling Прил.Н node IDs. Source of truth for missed-activation detection (Pravila §16.4 conditional rule). 'other' deliberately empty — no recommendation, never counts as missed. DEFERRED-узлы filtered out by .node-dormancy.json at runtime. Classifier vocabulary is Claude's free judgment when writing the episode (no hardcoded enum) — adding a key here makes it 'blessed'. 'security' added 22.05.2026 (A8 follow-up): use when the PURPOSE of the task is verifying or improving security (scans, hardening, audits, threat modeling, go-live gates); NOT for bug-fixes that happen to be in security-relevant code (those stay 'bugfix'). 'marketing' added 22.05.2026 (C1 follow-up): use when the PURPOSE of the task is Лидерра's own marketing/lead-generation (content, SEO, campaigns, RU-channels, landing conversion, marketing-side 152-FZ); NOT for product features, billing flows, or PII-code audits. 'question' emptied 23.05.2026 (brain-retro #3 A1): conversational Russian Q&A («делай», «а», уточнения) was producing 17/40 false-positive missed-activations against #60 context7 — context7 is for library-docs lookup, not chat. 'memory-sync' emptied 23.05.2026 (brain-retro #3 A2): #33 claude-md-management is the channel for CLAUDE.md edits (Pravila §5 п.10), NOT for memory/*.md (auto-memory writes natively); was producing 8/40 false-positive missed-activations.",
|
||||
"description": "DEPRECATED (2026-05-24): source of truth migrated to docs/registry/nodes.yaml + tools/registry-to-classification-map.mjs. This file is retained ONLY for historic v2-episode replay in tests; new code MUST consume the registry. Removal scheduled for stage 4 of router-discipline-overhaul. Original description follows. — Mapping from observer transcript-parser task_classification values to recommended Tooling Прил.Н node IDs. Source of truth for missed-activation detection (Pravila §16.4 conditional rule). 'other' deliberately empty — no recommendation, never counts as missed. DEFERRED-узлы filtered out by .node-dormancy.json at runtime. Classifier vocabulary is Claude's free judgment when writing the episode (no hardcoded enum) — adding a key here makes it 'blessed'. 'security' added 22.05.2026 (A8 follow-up): use when the PURPOSE of the task is verifying or improving security (scans, hardening, audits, threat modeling, go-live gates); NOT for bug-fixes that happen to be in security-relevant code (those stay 'bugfix'). 'marketing' added 22.05.2026 (C1 follow-up): use when the PURPOSE of the task is Лидерра's own marketing/lead-generation (content, SEO, campaigns, RU-channels, landing conversion, marketing-side 152-FZ); NOT for product features, billing flows, or PII-code audits. 'question' emptied 23.05.2026 (brain-retro #3 A1): conversational Russian Q&A («делай», «а», уточнения) was producing 17/40 false-positive missed-activations against #60 context7 — context7 is for library-docs lookup, not chat. 'memory-sync' emptied 23.05.2026 (brain-retro #3 A2): #33 claude-md-management is the channel for CLAUDE.md edits (Pravila §5 п.10), NOT for memory/*.md (auto-memory writes natively); was producing 8/40 false-positive missed-activations.",
|
||||
"map": {
|
||||
"refactor": ["#11", "#12", "#43", "#64", "#65"],
|
||||
"bugfix": ["#18", "#34"],
|
||||
|
||||
Reference in New Issue
Block a user