Compare commits
33 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 6ce2f0058d | |||
| d35fefddd9 | |||
| e56ddd6a1b | |||
| 53407a77cd | |||
| 6577c04a1f | |||
| 7a469dc913 | |||
| be4e1a6123 | |||
| b0cd18d797 | |||
| 30b79c7228 | |||
| 63100decce | |||
| f6421fd61c | |||
| d647bf1858 | |||
| 1f9b51bc39 | |||
| 8a7144892c | |||
| 722f4bb189 | |||
| 417cfcbc37 | |||
| c9b9efd6e4 | |||
| dfae9f760b | |||
| a8996896a8 | |||
| f82c878c60 | |||
| 3c5266c022 | |||
| 9280c48025 | |||
| 84dcf4aab3 | |||
| 80e514f5bb | |||
| f740f6124a | |||
| c86fdfc9eb | |||
| 9f84d9ef09 | |||
| 6d512f5cf3 | |||
| ca52d354f9 | |||
| c805988085 | |||
| 6ac4b1c1b1 | |||
| f172e2a580 | |||
| 4686b36571 |
@@ -21,10 +21,10 @@ jobs:
|
||||
extensions: pdo, pdo_pgsql, redis, mbstring, intl, bcmath
|
||||
coverage: none
|
||||
|
||||
- name: Setup Node 20
|
||||
- name: Setup Node 22
|
||||
uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: '20'
|
||||
node-version: '22'
|
||||
cache: 'npm'
|
||||
|
||||
- name: Install root JS deps
|
||||
|
||||
@@ -54,32 +54,7 @@
|
||||
},
|
||||
"comment": "A3 integration-tooling #47 — OpenAPI MCP (ivo-toby/mcp-openapi-server, @ivotoby/openapi-mcp-server v1.14.0, MIT). Exposes Лидерра REST API endpoints (docs/api/openapi.yaml) as MCP tools. Config via env-vars API_BASE_URL + OPENAPI_SPEC_PATH (stdio transport default). READ scope: API discovery/introspection for Claude Code. Формализован в Tooling §4.22, PSR_v1 R10.1 блок 3, Pravila §13.2."
|
||||
},
|
||||
"marketing-metrika": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "github:atomkraft/yandex-metrika-mcp"],
|
||||
"env": {
|
||||
"YANDEX_OAUTH_TOKEN": "${YANDEX_OAUTH_TOKEN}"
|
||||
},
|
||||
"comment": "C1 marketing-tooling #78 — Yandex Metrika MCP (vetted source: github:atomkraft/yandex-metrika-mcp, MIT — выбран по IS9-вету из 3 кандидатов, см. docs/security/marketing-vet.md). READ-ONLY аналитика: посещаемость, источники трафика, конверсии. Env: YANDEX_OAUTH_TOKEN — OAuth-токен с правами read-only. Постура IS9: READ-ONLY, мутации API Метрики не задействуются. Tooling §4.53. docs/marketing/README.md."
|
||||
},
|
||||
"marketing-wordstat": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "github:SvechaPVL/yandex-mcp"],
|
||||
"env": {
|
||||
"YANDEX_OAUTH_TOKEN": "${YANDEX_OAUTH_TOKEN}"
|
||||
},
|
||||
"comment": "C1 marketing-tooling #79 — Yandex Direct+Wordstat MCP (vetted source: github:SvechaPVL/yandex-mcp, MIT — выбран по IS9-вету, см. docs/security/marketing-vet.md). Репозиторий отдаёт 128 tools (Direct + Wordstat + Метрика); по IS9-условию используются ТОЛЬКО Wordstat-инструменты для подбора ключевых слов и оценки спроса — Direct-мутации (создание/правка кампаний, изменение ставок) поведенчески запрещены через marketing-ru #77 и MKT8 (никаких автоматических трат рекламного бюджета). Env: YANDEX_OAUTH_TOKEN с минимальным scope. Tooling §4.54. docs/marketing/README.md."
|
||||
},
|
||||
"marketing-telegram": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "github:chigwell/telegram-mcp"],
|
||||
"env": {
|
||||
"TELEGRAM_API_ID": "${TELEGRAM_API_ID}",
|
||||
"TELEGRAM_API_HASH": "${TELEGRAM_API_HASH}",
|
||||
"TELEGRAM_SESSION_STRING": "${TELEGRAM_SESSION_STRING}"
|
||||
},
|
||||
"comment": "C1 marketing-tooling #80 — Telegram MCP (chigwell/telegram-mcp, Apache-2.0, GitHub-only — не npm). Работа с Telegram-каналами и чатами Лидерры: публикация, планирование, аналитика. Env: TELEGRAM_API_ID + TELEGRAM_API_HASH (получить на https://my.telegram.org/apps) + TELEGRAM_SESSION_STRING (генерируется один раз через GramJS/Telethon, хранить в .env.local gitignored). ОБЯЗАТЕЛЬНО: выделенный Telegram-аккаунт для Лидерры, не личный (IS9-постура MKT8). Tooling §4.51. docs/marketing/README.md."
|
||||
},
|
||||
"_disabled_marketing_servers_note": "ОТКЛЮЧЕНЫ 2026-05-31 (владелец: «отрежь маркетинг»). Причина: их авто-генерируемые схемы (особенно wordstat — 128 tools из Яндекс.Директа) — главный подозреваемый в API 400 tools.110/113, ронявшем субагентов при bulk-load всех инструментов (subagent-driven-development). Серверы off-phase и без OAuth-токенов всё равно не стартовали. Полный конфиг — в git до этого коммита. Чтобы вернуть, восстановить три блока mcpServers: marketing-metrika (npx -y github:atomkraft/yandex-metrika-mcp; env YANDEX_OAUTH_TOKEN; READ-ONLY; Tooling §4.53), marketing-wordstat (npx -y github:SvechaPVL/yandex-mcp; env YANDEX_OAUTH_TOKEN; ТОЛЬКО Wordstat per IS9/MKT8; Tooling §4.54), marketing-telegram (npx -y github:chigwell/telegram-mcp; env TELEGRAM_API_ID/API_HASH/SESSION_STRING; выделенный аккаунт IS9; Tooling §4.51). См. docs/security/marketing-vet.md и docs/marketing/README.md.",
|
||||
"_comment_postiz_skeleton": "TODO: C1 marketing-tooling #81 — Postiz MCP (gitroomhq/postiz-app self-host + antoniolg/postiz-mcp). Активировать ПОСЛЕ: 1) развернуть Postiz self-hosted (git clone https://github.com/gitroomhq/postiz-app + docker-compose, AGPL-3.0: internal-only, no modifications); 2) провести vet лицензии antoniolg/postiz-mcp (NOT YET VERIFIED — см. docs/marketing/README.md Open vet notes); 3) подключить соцсети в Postiz UI. Будущий entry: \"marketing-postiz\": { \"command\": \"npx\", \"args\": [\"-y\", \"postiz-mcp\"], \"env\": { \"POSTIZ_API_URL\": \"${POSTIZ_API_URL}\", \"POSTIZ_API_KEY\": \"${POSTIZ_API_KEY}\" }, \"comment\": \"C1 #81 post-activation\" }. Tooling §4.52. docs/marketing/README.md."
|
||||
}
|
||||
}
|
||||
|
||||
+32
-26
@@ -1,6 +1,6 @@
|
||||
# Brain Status (auto-generated)
|
||||
|
||||
Last updated: 2026-05-30T03:11:28.244Z
|
||||
Last updated: 2026-05-30T13:11:39.164Z
|
||||
|
||||
| Контролёр | Состояние | Детали |
|
||||
|---|---|---|
|
||||
@@ -8,14 +8,14 @@ Last updated: 2026-05-30T03:11:28.244Z
|
||||
| C2 Cross-ref consistency | ✅ | [cross-ref-checker] OK — 0 drift in 4 files |
|
||||
| C3 Observer-of-observer | ✅ | [observer-of-observer] OK — last read 0 week(s) ago |
|
||||
| C4 Сигнальный статус | ✅ | This file (self-reference) |
|
||||
| C5 Observer-coverage | ⚠️ | 639 episode(s) this month · Stop-hook + post-commit OK · 20 missed activation(s) — see /brain-retro |
|
||||
| C5 Observer-coverage | ⚠️ | 752 episode(s) this month · Stop-hook + post-commit OK · 20 missed activation(s) — see /brain-retro |
|
||||
| C6 Chain map sync | ✅ | [chain-map-checker] OK — 16 chains in sync |
|
||||
|
||||
## Метрики (информационные, не алерты)
|
||||
|
||||
- Observer evidence: 639 episodes this month, 0 observer_error markers, 129 PII matches before filter
|
||||
- Legacy v1 episodes (not in factor analysis): 500
|
||||
- Last /brain-retro: 3 day(s) ago
|
||||
- Observer evidence: 752 episodes this month, 0 observer_error markers, 186 PII matches before filter
|
||||
- Legacy v1 episodes (not in factor analysis): 613
|
||||
- Last /brain-retro: 0 day(s) ago
|
||||
- Использование узлов: см. `/brain-retro` (раз в спринт). missed_activations: 20. **Неиспользованные узлы — не алерт, если профильной задачи не было** (Pravila §16.4 v1.36; capability-readiness; см. memory `feedback_brain_unused_tools_not_problem` — outside-repo memory store).
|
||||
|
||||
## Метрики дисциплины
|
||||
@@ -24,16 +24,16 @@ Baseline дисциплины роутера (этап 2 router discipline overh
|
||||
|
||||
| Тип задачи | Эпизодов | % с триггер-матчем | % через скил |
|
||||
|---|---|---|---|
|
||||
| analysis | 26 | 30.8% | 15.4% |
|
||||
| bugfix | 19 | 26.3% | 26.3% |
|
||||
| planning | 16 | 18.8% | 18.8% |
|
||||
| feature | 15 | 13.3% | 0.0% |
|
||||
| analysis | 34 | 23.5% | 14.7% |
|
||||
| planning | 25 | 12.0% | 16.0% |
|
||||
| bugfix | 25 | 24.0% | 20.0% |
|
||||
| feature | 19 | 10.5% | 0.0% |
|
||||
| cleanup | 6 | 0.0% | 0.0% |
|
||||
| refactor | 1 | 0.0% | 0.0% |
|
||||
|
||||
Router step distribution: 1: 281, 2: 227, 3: 63, 5: 61
|
||||
Router step distribution: 1: 330, 2: 279, 3: 67, 5: 67
|
||||
|
||||
Boundaries applied (ADR / границы): 72 of 632 эпизодов (11.4%).
|
||||
Boundaries applied (ADR / границы): 76 of 743 эпизодов (10.2%).
|
||||
|
||||
## Активные многоэтапные проекты
|
||||
|
||||
@@ -45,16 +45,22 @@ Boundaries applied (ADR / границы): 72 of 632 эпизодов (11.4%).
|
||||
|
||||
## Длинные сессии
|
||||
|
||||
Ни одной сессии с >50 ходов сегодня (UTC). ✅
|
||||
⚠️ Сегодня (2026-05-30 UTC) есть сессии с ≥50 ходов — корреляция с падением дисциплины роутинга (retro #5 candidate B).
|
||||
|
||||
| session_id | макс. ход | % regulated | последний эпизод |
|
||||
|---|---|---|---|
|
||||
| `52b2b52d` | 75 | 3% | 2026-05-30T11:45:39.213Z |
|
||||
|
||||
Long sessions correlate with discipline drift. Если % regulated просел в текущей сессии — рассмотри перезапуск.
|
||||
|
||||
## Стоимость месяца
|
||||
|
||||
| Компонент | Токены (in/out) | USD |
|
||||
|---|---|---|
|
||||
| Classifier (Sonnet 4.6) | 3237/42293 | $0.64 |
|
||||
| Classifier (Sonnet 4.6) | 12550/86494 | $1.34 |
|
||||
| Self-assessment (Sonnet 4.6) | 0/0 | $0.00 |
|
||||
| Reviewer (Opus 4.7 + fallback) | 0/0 | $0.00 |
|
||||
| **Итого** | | **$0.64** |
|
||||
| **Итого** | | **$1.34** |
|
||||
|
||||
## Аномалии классификатора
|
||||
|
||||
@@ -67,40 +73,40 @@ Episodes since last run: 542 / threshold: 10
|
||||
|
||||
## Reviewer: субагент vs fallback
|
||||
|
||||
0 эпизодов проверено из 639.
|
||||
0 эпизодов проверено из 752.
|
||||
|
||||
## Reviewer findings
|
||||
|
||||
Проверено: 339 эпизодов. **51 actionable** (wrong_skill + wrong_chain_order).
|
||||
Проверено: 372 эпизодов. **69 actionable** (wrong_skill + wrong_chain_order).
|
||||
|
||||
### error_root_cause
|
||||
|
||||
| cause | count |
|
||||
|---|---:|
|
||||
| n/a | 261 |
|
||||
| wrong_skill | 41 |
|
||||
| external_failure | 23 |
|
||||
| wrong_chain_order | 10 |
|
||||
| n/a | 271 |
|
||||
| wrong_skill | 55 |
|
||||
| external_failure | 28 |
|
||||
| wrong_chain_order | 14 |
|
||||
| wrong_tool | 4 |
|
||||
|
||||
### Топ alternative_better
|
||||
|
||||
| recommended | count |
|
||||
|---|---:|
|
||||
| #19 | 16 |
|
||||
| #19 | 18 |
|
||||
| #25 | 15 |
|
||||
| #34 | 8 |
|
||||
| #18 | 6 |
|
||||
| #18 | 8 |
|
||||
| #33 | 3 |
|
||||
|
||||
### node_quality
|
||||
|
||||
| judgment | count |
|
||||
|---|---:|
|
||||
| disputable | 191 |
|
||||
| correct | 113 |
|
||||
| wrong_node | 31 |
|
||||
| underkill | 2 |
|
||||
| disputable | 207 |
|
||||
| correct | 120 |
|
||||
| wrong_node | 40 |
|
||||
| underkill | 3 |
|
||||
| overkill | 2 |
|
||||
|
||||
## Использование override-фраз
|
||||
|
||||
@@ -0,0 +1,94 @@
|
||||
# Router-gate v4 — оставшиеся дыры (чек-лист «на потом»)
|
||||
|
||||
**Дата:** 2026-05-30
|
||||
**Контекст:** после закрытия нестыковки №1 (убраны 2 лишние записи судьи из `.claude/settings.json`).
|
||||
**Статус системы:** Layers 1–3 работают; Layer 4 (судья) построен как движок + добавлен config-выключатель (DEFAULT OFF); нигде не прописан и без ключа → реально выключен. Владелец 30.05 выбрал курс «включать», но активация (ключ + флаг + хуки) — отдельный его шаг.
|
||||
|
||||
> Делать в **чистой сессии**: без параллельных Claude-сессий и НЕ в изолированной копии (worktree).
|
||||
> Многое упирается в файл `.claude/settings.json` — Claude'у его Read/Edit заблокированы собственной защитой, нужна ручная правка владельцем.
|
||||
|
||||
---
|
||||
|
||||
## Приоритет 1 — обёртка написана (TDD), подключение отложено
|
||||
|
||||
### [x] 1a. Обёртка `enforce-safe-baseline-metering.mjs` — СДЕЛАНО (30.05, worktree h-close)
|
||||
|
||||
- **Что сделано:** обёртка с чистой функцией `decide()` (инкремент per-task счётчика + оценка порогов через `incrementCounter`/`evaluateThresholds`) + функция границ задачи `processEvent()` (см. 1b) + 14 тестов. TDD: тест первым, RED подтверждён в том же ходе, GREEN 14/14.
|
||||
- **Шаблон:** как соседние обёртки Stream H (`enforce-decomposition-detector.mjs`) — `main()` намеренно no-op (exit 0), без живого подключения и без self-lockout.
|
||||
- **NB по среде:** TDD-сторож сверяет правки по основной папке и не видит правки в worktree → ложно блокирует; фразы-исключения в v4 отключены (universal vocab removal, `findOverride`→null), текст «Override: …» в сообщении хука устарел. Цикл RED→GREEN нужно делать в ОДНОМ ходе (правка теста + красный прогон + запись реализации), тогда сторож засчитывает.
|
||||
|
||||
### [x] 1b. Живое подключение `safe-baseline` — СДЕЛАНО (31.05, commits `f740f612` + `80e514f5` + `84dcf4aa`, pushed)
|
||||
|
||||
- **Спроектировано** через brainstorming (3 adversarial-ревью + ghost-pass): спек `docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md` v4. Закрыты C1 (escape Skill/EnterPlanMode никогда не блокируется) / C2 (skill-match только по реальному tool_use, без self-writable text-path) / C3 (write-deny на runtime, decoupled) / H1 (детерминированная токенизация) / V2-1 (stickiness-контракт, без потери/утечки между задачами) / V2-2 (`.`-segment-proof через `pathNormalize`). G3 override-подсистема вырезана как ghost-protection (escape всегда доступен).
|
||||
- **Реализовано (TDD):** `extractKeywords` + `detectSkillMatch` + `runLiveDecision` + живой `runMain`/`main` в `tools/enforce-safe-baseline-metering.mjs` (+14 тестов); новый `tools/enforce-runtime-write-deny.mjs` (+7 тестов). Регрессия **1880 GREEN**.
|
||||
- **Режим:** hard-block (решение владельца «убери g3, больше ничего»). observe-флаг не добавлялся.
|
||||
- **Осталось (владелец):** регистрация обоих хуков в `.claude/settings.json` (точный блок — в handoff-заметке `2026-05-30-safe-baseline-overnight-handoff.md`); Claude'у settings.json заблокирован. До регистрации хуки инертны.
|
||||
|
||||
---
|
||||
|
||||
## Приоритет 2 — Layer 4 (судья): выключатель готов, активация за владельцем
|
||||
|
||||
### [~] 2. «Мозг» судьи (Layer 4 plumbing) — config-выключатель СДЕЛАН (30.05)
|
||||
|
||||
- **Находка:** движок `tools/llm-judge.mjs` УЖЕ полный (consensus + anti-injection + cache/budget); `llmJudgeCall` при отсутствии ключа возвращает `null`/degraded → fail-safe.
|
||||
- **2a config-выключатель — СДЕЛАНО:** `tools/llm-judge-config.mjs` `resolveJudgeConfig()` — DEFAULT OFF, `enabled=true` только если И флаг `ROUTER_LLM_JUDGE_ENABLED` truthy, И ключ резолвится (keychain→env); keychain-ошибки degrade в «нет ключа, выключен», не бросают. +10 тестов GREEN; связка judge+safe-baseline 93/93 без регрессий. Файл написан, судья ОСТАЁТСЯ ВЫКЛЮЧЕННЫМ (нет флага, нет ключа, хуки не прописаны).
|
||||
- **2b активация (НЕ сделано, требует владельца, деньги отсюда):** (1) ключ в keychain (служба `router-gate-llm-judge`/`default`) ИЛИ `ROUTER_LLM_KEY`; (2) `ROUTER_LLM_JUDGE_ENABLED=1`; (3) хуки `enforce-llm-judge-*` в settings.json. До всех трёх — $0.
|
||||
|
||||
### [x] 3. Хук-обёртки судьи — СДЕЛАНО (31.05, commit `ca52d354`, pushed)
|
||||
|
||||
- **Что:** `tools/enforce-llm-judge-per-tool.mjs` + `tools/enforce-llm-judge-response-scan.mjs` написаны по TDD как соседние обёртки — чистая `decide()` (уважает config-gate, disabled→allow $0) + namespaced **no-op `main()`** (БЕЗ регистрации в settings.json). 14 тестов GREEN, полный прогон без регрессий.
|
||||
- **Зачем:** недостающее звено между движком судьи и settings.json — готово к шагу 2b.3.
|
||||
- **Осталось (владелец, 2b):** ключ + флаг `ROUTER_LLM_JUDGE_ENABLED=1` + регистрация хуков в settings.json. До всех трёх — $0.
|
||||
|
||||
---
|
||||
|
||||
## Приоритет 3 — порядок и документация
|
||||
|
||||
### [~] 4. Синхронизация «мозга» (нормативка) — КОНТЕНТ ГОТОВ, ПРИМЕНЕНИЕ ЗАБЛОКИРОВАНО (31.05)
|
||||
|
||||
- **Готово:** ready-to-paste §6-абзац + §9-entry + header version-bump для 1b — `docs/observer/notes/2026-05-31-claude-md-1b-insertion-draft.md`. §0 cross-ref счётчики НЕ меняются (инфраструктура `tools/`, не tooling-канон #1-#86 / не ADR / не off-phase).
|
||||
- **⚠️ НОВЫЙ БЛОКЕР (31.05):** `enforce-read-path-deny` (Smoke 5, 30.05) добавил `CLAUDE.md` в Read-protected paths → harness Edit требует предварительного Read → **Edit CLAUDE.md для Claude невозможен**, а Write-overwrite канонического файла слишком рискован. Это **over-block** legit `claude-md-management` workflow (Smoke 5 целил в transcript/runtime exfil; Read-deny на публичный-в-репо CLAUDE.md security-ценности не несёт). Владелец: либо сузить `DEFAULT_PROTECTED_PATTERNS` (убрать `CLAUDE.md` из Read-deny, оставить Bash/PowerShell/Write-защиты), либо вставить вручную из draft. Учение уже зафиксировано в этой заметке + handoff, ничего не теряется.
|
||||
|
||||
### [ ] 5. Выйти из изолированной копии (worktree) — ПОДГОТОВЛЕНО К РЕАЛИЗАЦИИ (31.05)
|
||||
|
||||
- **Верификация выполнена (31.05):** worktree `.claude/worktrees/router-gate-v4-stream-h-close` проверен — все 4 рабочих файла (`enforce-safe-baseline-metering.mjs`+`.test.mjs`, `llm-judge-config.mjs`+`.test.mjs`) **байт-в-байт идентичны main** (4× пустой `git diff --no-index`); `git log main..worktree-router-gate-v4-stream-h-close` **пуст** (нет уникальных коммитов). Несохранённой нужной работы НЕТ — терять нечего.
|
||||
- **Готовая команда (выполняет ВЛАДЕЛЕЦ — `git worktree` для Claude в default-deny гейта, approval-пути к нему нет; через PowerShell — запрещённый обход):**
|
||||
|
||||
```bash
|
||||
git worktree remove --force ".claude/worktrees/router-gate-v4-stream-h-close"
|
||||
git branch -D worktree-router-gate-v4-stream-h-close # опционально — ветка-база, уникальных коммитов нет
|
||||
```
|
||||
|
||||
`--force` нужен: рабочая папка worktree содержит те же 4 файла, что уже в main (relative своей старой ветки они «незакоммичены»), плюс авто-регенерируемый STATUS.md-дрейф.
|
||||
- **Статус решения:** 30.05 владелец выбрал «оставить worktree». Шаги выше — на случай, когда решит удалить; ничего не блокируют (worktree безвреден, только занимает диск).
|
||||
|
||||
---
|
||||
|
||||
## Приоритет 4 — крупное, требует железа и ручных шагов владельца
|
||||
|
||||
### [ ] 6. Layer 5 (v4.2) — виртуалка / биометрия / YubiKey
|
||||
|
||||
- **Что:** Phase 1 VirtualBox ($0), Phase 2+3 — YubiKey ($50–150 разово, один ключ покрывает биометрию + HSM).
|
||||
- **Загвоздка:** Claude может написать только конфиги/инструкции; установка и железо — на владельце.
|
||||
- **Делать:** отдельным заходом, когда дойдут руки и появится YubiKey.
|
||||
|
||||
---
|
||||
|
||||
## Перенос в git — СДЕЛАНО (31.05)
|
||||
|
||||
Всё зафиксировано и запушено в `origin/main` (`c8059880..84dcf4aa`, fast-forward, gitleaks-full-history GREEN / lychee 0 errors). Коммиты сессии:
|
||||
|
||||
- `ca52d354` — judge-обёртки (item 3).
|
||||
- `6d512f5c`/`9f84d9ef`/`c86fdfc9`/`84dcf4aa` — спек safe-baseline v1→v4 + план + handoff (item 1b doc).
|
||||
- `f740f612` — живой safe-baseline `main()` (item 1b code).
|
||||
- `80e514f5` — `enforce-runtime-write-deny` (C3).
|
||||
|
||||
Items 1a/2a (`enforce-safe-baseline-metering` обёртка + `llm-judge-config`) были перенесены из worktree ранее (commits `6ac4b1c1`+`c8059880`).
|
||||
|
||||
## Что НЕ требует действий (уже сделано параллельными сессиями)
|
||||
|
||||
- recovery-procedures.md — есть.
|
||||
- brain-retro таблицы 16–17 — есть (в анализаторе).
|
||||
- Исправления `extractPathArgs` / `pathDenyOverlay` — есть.
|
||||
- Защита от чтения транскриптов (Smoke 5) — работает.
|
||||
- Smoke-тесты 1–9 — прогнаны.
|
||||
@@ -0,0 +1,75 @@
|
||||
# Safe-baseline live wiring (1b) — overnight handoff
|
||||
|
||||
**Date:** 2026-05-30 (night)
|
||||
**Status:** Implemented + tested on disk. **NOT committed** (git commits need your AskUserQuestion approval at the gate; you were asleep). Morning = review → approve commits → register in settings.json.
|
||||
|
||||
---
|
||||
|
||||
## What was done autonomously
|
||||
|
||||
1. **Spec → v4** (`docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md`): removed the G3 override subsystem ("убери g3, больше ничего"); escape is now solely Skill/EnterPlanMode (always available). Runtime write-deny kept but **decoupled** into a standalone git-approval-anchor hardening. *(spec edits are on disk, uncommitted — the last committed spec is v3 `c86fdfc9`.)*
|
||||
2. **Plan** (`docs/superpowers/plans/2026-05-30-safe-baseline-live-wiring.md`): 6 TDD tasks.
|
||||
3. **Implementation (TDD, RED→GREEN):**
|
||||
- `tools/enforce-safe-baseline-metering.mjs` — added `extractKeywords` (H1), `detectSkillMatch` (C2/V2-5), `runLiveDecision` (V2-1 stickiness contract), live `runMain`/`main` (replaces the no-op).
|
||||
- `tools/enforce-runtime-write-deny.mjs` (new) — standalone write-deny on `~/.claude/runtime/**`, resolving `pathNormalize` (V2-2 `.`-segment-proof).
|
||||
- Tests: `enforce-safe-baseline-metering.test.mjs` (+14), `enforce-runtime-write-deny.test.mjs` (+7).
|
||||
4. **Regression:** `npm run test:tools` → **1880 passed | 2 skipped** (was 1859). Narrow runs all GREEN.
|
||||
|
||||
## Decisions I made on my own (correct in the morning if wrong)
|
||||
|
||||
- **G3 override removed** — per your explicit instruction.
|
||||
- **Hard-block kept (not observe-mode).** My honest recommendation was observe-first behind a mode flag, but you said "убери g3, больше ничего" → I did NOT add an observe mode. If you want observe-first, say so and I'll add a `mode` flag (default observe) cheaply.
|
||||
- **`enforce-runtime-write-deny` fails-OPEN on a normalizer exception** (blocks only on a *confirmed* runtime match). Rationale: a fail-CLOSE Write hook that errors would self-lock the controller out of ALL edits during an unattended run. Residual: a malformed path that throws is not blocked. Flip to fail-CLOSE if you prefer strict security.
|
||||
|
||||
## Queued commits (morning — approve each exact git command at the gate)
|
||||
|
||||
```bash
|
||||
git add docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md
|
||||
git commit docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md -m "docs(router-gate-v4): safe-baseline spec v4 — cut G3 override, decouple write-deny (item 1b)"
|
||||
|
||||
git add docs/superpowers/plans/2026-05-30-safe-baseline-live-wiring.md
|
||||
git commit docs/superpowers/plans/2026-05-30-safe-baseline-live-wiring.md -m "docs(router-gate-v4): safe-baseline live-wiring implementation plan (item 1b)"
|
||||
|
||||
git add tools/enforce-safe-baseline-metering.mjs tools/enforce-safe-baseline-metering.test.mjs
|
||||
git commit tools/enforce-safe-baseline-metering.mjs tools/enforce-safe-baseline-metering.test.mjs -m "feat(safe-baseline): live main() — metering + hard-block + Skill/EnterPlanMode escape (item 1b)"
|
||||
|
||||
git add tools/enforce-runtime-write-deny.mjs tools/enforce-runtime-write-deny.test.mjs
|
||||
git commit tools/enforce-runtime-write-deny.mjs tools/enforce-runtime-write-deny.test.mjs -m "feat(router-gate-v4): enforce-runtime-write-deny — protect ~/.claude/runtime side-channels (C3)"
|
||||
|
||||
git add docs/observer/notes/2026-05-30-safe-baseline-overnight-handoff.md
|
||||
git commit docs/observer/notes/2026-05-30-safe-baseline-overnight-handoff.md -m "docs(observer): safe-baseline overnight handoff note"
|
||||
```
|
||||
|
||||
(A fresh `npm run test:tools` GREEN gives the verify-before-push sentinel for the code commits; docs-only commits short-circuit.)
|
||||
|
||||
## Registration (you apply — Claude cannot edit settings.json)
|
||||
|
||||
Add to `.claude/settings.json` `hooks.PreToolUse`:
|
||||
|
||||
```json
|
||||
{ "matcher": "Read|Grep|Glob|LS|TodoWrite|AskUserQuestion|Edit|Write|MultiEdit|NotebookEdit|Bash|Skill|Task|EnterPlanMode",
|
||||
"hooks": [{ "type": "command", "command": "node tools/enforce-safe-baseline-metering.mjs", "timeout": 10 }] }
|
||||
```
|
||||
|
||||
```json
|
||||
{ "matcher": "Edit|Write|MultiEdit|NotebookEdit",
|
||||
"hooks": [{ "type": "command", "command": "node tools/enforce-runtime-write-deny.mjs", "timeout": 5 }] }
|
||||
```
|
||||
|
||||
Until registered, both hooks are inert.
|
||||
|
||||
**Before registering — owner check:** does `.claude/settings.json` already have a `permissions.deny` covering Write to `~/.claude/**`? If yes, `enforce-runtime-write-deny` is redundant (still harmless). I couldn't read settings.json (gate-blocked).
|
||||
|
||||
## Open questions for the morning
|
||||
|
||||
1. **"раздел 5 основного плана подготовь к реализации"** — which document and which section 5? Candidates: the remaining-holes checklist (`docs/observer/notes/2026-05-30-router-gate-v4-remaining-holes.md` — its item 5 = close the worktree, already decided "keep") OR the master coordination plan OR the v4 design §5. I did NOT guess to avoid wasted/wrong work. Tell me which and I'll prepare it.
|
||||
2. **Normative sync ("корректируй всю документацию"):** CLAUDE.md / Pravila / PSR / Tooling — these are gate-protected AND were being edited by a parallel session (§15.2). The safe-baseline live-wiring is infrastructure (`tools/enforce-*.mjs`), not a new tooling-canon node / ADR / off-phase subcategory, so the §0 cross-ref counters likely do NOT change; CLAUDE.md §6 would get one paragraph + §9 one entry. To do via `claude-md-management` once the parallel session is done. Flagged, not done.
|
||||
3. **observe vs enforce** (see Decisions).
|
||||
4. **Judge activation (2b)** still owner-gated ($) — untouched.
|
||||
|
||||
## Not done (blocked, not skipped)
|
||||
|
||||
- Live registration / "run the agent" — needs settings.json (owner-only).
|
||||
- Mandatory pre-registration smoke (owner-run after registering): the integration tests already exercise block/allow/escape; the registration smoke is a final live check.
|
||||
- CLAUDE.md normative sync (blocked, see Q2).
|
||||
- The commits themselves (gate needs your approval awake).
|
||||
@@ -0,0 +1,26 @@
|
||||
# CLAUDE.md insertion draft — safe-baseline 1b (ready to paste)
|
||||
|
||||
**Why a draft, not a direct edit:** `enforce-read-path-deny` (Smoke 5, 2026-05-30) added `CLAUDE.md` to the Read-protected paths (`DEFAULT_PROTECTED_PATTERNS` `/(^|\/)CLAUDE\.md$/i`). The harness Edit tool requires a prior Read of the target; with Read gate-blocked, **Edit of CLAUDE.md is impossible** for Claude, and a full Write-overwrite of the canonical file is too risky. This is an over-block of the legit `claude-md-management` workflow (the Smoke 5 fix targeted transcript/runtime exfil; normative-doc Read-deny is collateral).
|
||||
|
||||
**Owner options:**
|
||||
|
||||
1. Temporarily narrow `DEFAULT_PROTECTED_PATTERNS` so `enforce-read-path-deny` does NOT block `CLAUDE.md` Read (keep the Bash/PowerShell + Write protections); then a normal `claude-md-management` session applies the inserts. **Recommended** — the Read-deny on CLAUDE.md has no security value (CLAUDE.md is public-in-repo; the real exfil targets are `~/.claude/projects` transcripts + `~/.claude/runtime`).
|
||||
2. Paste the blocks below manually.
|
||||
|
||||
The substantive learning is already committed in `docs/observer/notes/2026-05-30-router-gate-v4-remaining-holes.md` + the handoff note, so nothing is lost meanwhile.
|
||||
|
||||
---
|
||||
|
||||
## Header version line — bump
|
||||
|
||||
Change the opening of `**Версия:** 2.42 …` to v2.43, prepending:
|
||||
|
||||
> **Версия:** 2.43 от 31.05.2026 — **router-gate v4 safe-baseline live wiring (item 1b) + enforce-runtime-write-deny (C3) + LLM-judge hook-обёртки реализованы, протестированы (1880 GREEN), запушены** (commits `ca52d354`+`6d512f5c..84dcf4aa`+`f740f612`+`80e514f5` на main). Spec v4 закрыл C1/C2/C3/H1/V2-1/V2-2 через 3 adversarial-ревью + ghost-pass; G3 override вырезан как защита-призрак. §0 cross-refs НЕ меняются (инфраструктура `tools/`, не tooling-канон #1-#86 / не ADR / не off-phase). **v2.42 наследие:** …(оставить прежний текст)…
|
||||
|
||||
## §6 — prepend this paragraph (above the 2026-05-29 entry)
|
||||
|
||||
**2026-05-31 router-gate v4 — safe-baseline live wiring (item 1b) + enforce-runtime-write-deny (C3) + LLM-judge hook-обёртки реализованы и запушены:** `tools/enforce-safe-baseline-metering.mjs` получил живой `main()` (метеринг safe-baseline tools per-task + hard-block mutating-инструмента за hard-порогом без skill-match; escape = вызов любого Skill/EnterPlanMode, который этим слоем никогда не блокируется); новые чистые функции `extractKeywords` (детерминированная токенизация со стоп-словами против ложного overlap), `detectSkillMatch` (только реальный assistant tool_use Skill/EnterPlanMode — не self-writable text-path), `runLiveDecision` (контракт stickiness: skill-match привязан к задаче и явно сохраняется, без потери и без утечки между задачами). Новый standalone-хук `tools/enforce-runtime-write-deny.mjs` закрывает уже-существующую дыру: Write/Edit-инструмент мог писать в `~/.claude/runtime/**` напрямую (git-approval anchor был открыт для Write-инструмента — Bash/PowerShell-гейты его прикрывали, Write-канал нет); нормализация через resolving `pathNormalize` (`path.resolve`+`realpath`) делает обход через `.`/`..`-сегменты невозможным. Спроектировано через `superpowers:brainstorming` (3 раунда adversarial-саморевью + ghost-pass), spec v4 `docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md` закрыл C1/C2/C3/H1/V2-1/V2-2; G3 override-подсистема вырезана как защита-призрак. Реализация через `superpowers:writing-plans` → TDD. Также `tools/enforce-llm-judge-per-tool.mjs` + `tools/enforce-llm-judge-response-scan.mjs` (Layer 4 hook-обёртки, no-op `main()`, $0 до активации 2b). Регрессия vitest tools-only **1880 GREEN**. Коммиты `ca52d354`+`6d512f5c..84dcf4aa`+`f740f612`+`80e514f5` (push `c8059880..84dcf4aa main`, gitleaks-full-history GREEN / lychee 0 errors). Режим **hard-block** (решение владельца). Регистрация обоих хуков в `.claude/settings.json` — шаг владельца (Claude'у settings.json заблокирован); до регистрации хуки инертны. **§0 cross-refs НЕ меняются** — инфраструктура `tools/enforce-*.mjs`, не tooling-канон #1-#86 / не ADR / не off-phase. Через `claude-md-management:revise-claude-md`.
|
||||
|
||||
## §9 — prepend this entry (above the v2.42 entry)
|
||||
|
||||
- **v2.43 от 31.05.2026 — safe-baseline live wiring (item 1b) + enforce-runtime-write-deny (C3) + LLM-judge hook-обёртки** — `tools/enforce-safe-baseline-metering.mjs` живой `main()` (метеринг + hard-block + Skill/EnterPlanMode escape) с чистыми `extractKeywords`/`detectSkillMatch`/`runLiveDecision` (stickiness-контракт V2-1); новый `tools/enforce-runtime-write-deny.mjs` (C3 — защита `~/.claude/runtime` от Write-инструмента, `.`-segment-proof через `pathNormalize`); judge-обёртки `enforce-llm-judge-{per-tool,response-scan}.mjs` (no-op main, $0). Спек v4 через brainstorming (3 adversarial-ревью + ghost-pass) закрыл C1/C2/C3/H1/V2-1/V2-2; G3 override вырезан как защита-призрак. TDD, регрессия 1880 GREEN. Commits `ca52d354`+`6d512f5c..84dcf4aa`+`f740f612`+`80e514f5`, push `c8059880..84dcf4aa`. **§0 cross-refs не меняются** (инфраструктура `tools/`, не tooling-канон / не ADR / не off-phase). §6 +абзац / §9 +этот entry. Через `claude-md-management:revise-claude-md`.
|
||||
@@ -0,0 +1,641 @@
|
||||
# Lead Region Resolution — Master Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use `superpowers:subagent-driven-development` (recommended) or `superpowers:executing-plans` to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
>
|
||||
> **This is a MASTER plan split into 6 sessions.** Each session is a self-contained, testable deliverable. Execute sessions **in order** (later sessions depend on earlier ones). Each session = one subagent-driven-development run with its own review checkpoints. Before starting a session, re-read this header + the session's "Preconditions".
|
||||
|
||||
**Goal:** Резолвить настоящий регион лида по телефону (DaData → Россвязь → tag-fallback) и переключить `LeadRouter` на каскадную маршрутизацию по региону, чтобы клиенты, делящие один источник с разными regions, получали только лиды своего региона.
|
||||
|
||||
**Architecture:** Новый сервис `LeadRegionResolver` вызывается в `RouteSupplierLeadJob::handle()` ДО транзакционного цикла, резолвит `subject_code` + оператора по телефону, персистит в `supplier_leads` + `lead_region_resolution_log`. `LeadRouter::matchEligibleProjects` получает новый параметр `?int $resolvedSubjectCode` и фильтрует кандидатов в 3 фазы (точное совпадение региона → «вся РФ» → запасной канал с подменой). Локальный реестр Россвязи (`phone_ranges`) — fallback когда DaData недоступна/неуверена.
|
||||
|
||||
**Tech Stack:** PHP 8.3, Laravel 13, PostgreSQL 16 (партиции, RLS, `INT[]`), Pest 4, Redis (кэш + token-bucket), DaData REST API (`cleaner.dadata.ru/api/v1/clean/phone`).
|
||||
|
||||
**Source spec:** [docs/superpowers/specs/2026-05-29-lead-region-resolution-design.md](../specs/2026-05-29-lead-region-resolution-design.md) v0.5. Прочитать целиком перед стартом — этот план не дублирует §3-§12 спеки, а превращает их в исполнимые шаги.
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ КРИТИЧЕСКИЕ ПОПРАВКИ К СПЕКЕ (читать ДО любого кода)
|
||||
|
||||
Эти расхождения спеки с фактическим кодом обнаружены прямым code-walking 30.05.2026. Implementer ОБЯЗАН следовать факту, а не цифрам/именам из спеки.
|
||||
|
||||
1. **Коды субъектов — НЕ автомобильные.** Спека §3.4.1 пишет «77 Москва, 50 МО, 78 СПб, 47 ЛО» — это НЕВЕРНО. Источник истины — [`app/app/Support/RussianRegions.php`](../../../app/app/Support/RussianRegions.php) `CODE_TO_NAME` (конституционный порядок ст. 65, 1..89):
|
||||
- **Москва = 82**, **Санкт-Петербург = 83**, **Московская область = 56**, **Ленинградская область = 53**.
|
||||
- Севастополь = 84, Республика Крым = 13.
|
||||
- Везде в коде/тестах/маппингах использовать ЭТИ коды.
|
||||
|
||||
2. **`RussianRegions` НЕ имеет `codeToName()`-метода.** Есть только `public const CODE_TO_NAME` (массив) и `public static function nameToCode(): array` (через `array_flip`). Если нужен code→name — читать константу `RussianRegions::CODE_TO_NAME[$code]`.
|
||||
|
||||
3. **`LeadRouter::matchEligibleProjects` имеет ДВА SQL-пути** — `DIRECT` (по `signal_type` + `unique_key`) и `B1/B2/B3` (через `project_supplier_links` pivot). Каскад (§3.9) спека показывает только для pivot-пути — **реализовать каскад для ОБОИХ путей**.
|
||||
|
||||
4. **`project_routing_snapshots` УЖЕ содержит `regions INT[] NOT NULL DEFAULT '{}'`** (миграция `2026_05_27_120000`). Колонку добавлять НЕ нужно — каскадный WHERE ложится на готовую колонку через `?::int = ANY(snap.regions)` и `snap.regions = '{}'::int[]`.
|
||||
|
||||
5. **`LeadDistributor::selectRecipients` сейчас берёт cap=3 СЛУЧАЙНО.** Каскад спеки требует упорядоченный отбор (точное → РФ → запасной, сортировка по остатку лимита DESC) внутри роутера. Реконсиляция: роутер сам обрезает до 3 упорядоченно → `LeadDistributor` при `count ≤ CAP` возвращает коллекцию как есть (без шаффла, строка 36-38). Это **смена поведения** (random → детерминированный по остатку лимита). Зафиксировано как сознательное решение — см. §«Открытый вопрос D1» ниже. НЕ менять `LeadDistributor`; роутер просто отдаёт ≤3.
|
||||
|
||||
6. **`subject_code` пишется в `deals` уже сейчас** (Job строка 405-406, через `?int $subjectCode` из `RegionTagResolver`). Интеграция — заменить источник, не добавить колонку. `deals.subject_code` уже существует (миграция `2026_05_20_102000`).
|
||||
|
||||
7. **Команда запуска тестов:** из каталога `app/`. Один файл: `cd app && ./vendor/bin/pest tests/Unit/Services/LeadRegionResolverTest.php`. Фильтр по имени: `cd app && ./vendor/bin/pest --filter="dadata qc 0"`. Полный прогон сервиса перед коммитом сессии. **NB Bash cwd persists** — всегда префиксить `cd app &&` или использовать subshell.
|
||||
|
||||
---
|
||||
|
||||
## Открытые вопросы для заказчика (решить ДО Session 5-6)
|
||||
|
||||
- **D1 (поведение распределения):** Сейчас при >3 кандидатах лид раздаётся 3 СЛУЧАЙНЫМ клиентам. Новый каскад раздаёт 3 клиентам с НАИБОЛЬШИМ остатком дневного лимита (детерминированно). Это значит: клиент с большим остатком лимита систематически получает больше лидов, чем клиент с малым. Спека §3.9 явно выбрала «сортировка по остатку DESC». **Подтвердить, что random-распределение можно убрать.** (Если заказчик хочет сохранить случайность внутри региона — это +1 задача: random-shuffle внутри каждой фазы перед cap.)
|
||||
- **D2 (ambiguous-list staging):** Список «объединённых» регионов DaData (`'Санкт-Петербург и область'`, `'Москва и область'`) расширяется только по реальным наблюдениям на staging (спека §3.4.1). На старте — ровно эти 2 строки. Подтверждается smoke-прогоном (Session 6).
|
||||
|
||||
---
|
||||
|
||||
## Общие конвенции (применять во ВСЕХ сессиях)
|
||||
|
||||
### Тестовый сетап (Pest 4)
|
||||
|
||||
- **Unit-тесты** (`app/tests/Unit/...`): чистые, без БД где возможно; `Http::fake()` для DaData; `Cache::fake()`/`Cache::store('array')` для кэша.
|
||||
- **Feature-тесты** (`app/tests/Feature/...`): `uses(DatabaseTransactions::class)` + `uses(Tests\Concerns\SharesSupplierPdo::class)`. Tenant-контекст: `DB::statement("SELECT set_config('app.current_tenant_id', '0', true)")` в `beforeEach` (как [`LeadRouterTest.php`](../../../app/tests/Feature/Services/LeadRouterTest.php)).
|
||||
- Фабрики: `Tenant::factory()`, `Project::factory()`, `SupplierProject::factory()`/`::query()->create([...])`, `SupplierLead::factory()`.
|
||||
- Хелперы (в [`app/tests/Pest.php`](../../../app/tests/Pest.php)): `linkProjectToSupplier($project, $supplier)`, `createRoutingSnapshotFromProject($project, ...)` — **последний расширяется в Session 5** (добавить `string $regions = '{}'` параметр).
|
||||
- Pest-стиль: `it('...', function () { ... })`, `expect($x)->toBe(...)`. Никакого PHPUnit class-стиля в новых тестах.
|
||||
|
||||
### Паттерн миграции (raw SQL, образец — `2026_05_27_120000_create_project_routing_snapshots_table.php`)
|
||||
|
||||
```php
|
||||
<?php
|
||||
declare(strict_types=1);
|
||||
use Illuminate\Database\Migrations\Migration;
|
||||
use Illuminate\Support\Facades\DB;
|
||||
|
||||
return new class extends Migration {
|
||||
public function up(): void
|
||||
{
|
||||
// SET ROLE crm_migrator на проде; на dev/testing — fallback postgres superuser.
|
||||
try {
|
||||
DB::statement('SET ROLE crm_migrator');
|
||||
$canCreate = DB::selectOne("SELECT has_schema_privilege('crm_migrator', 'public', 'CREATE') AS ok");
|
||||
if (!$canCreate || !$canCreate->ok) { DB::statement('RESET ROLE'); }
|
||||
} catch (\Throwable) { /* окружение без роли — продолжаем как superuser */ }
|
||||
|
||||
DB::unprepared(<<<'SQL'
|
||||
-- DDL здесь
|
||||
SQL);
|
||||
}
|
||||
public function down(): void
|
||||
{
|
||||
try {
|
||||
DB::statement('SET ROLE crm_migrator');
|
||||
$canCreate = DB::selectOne("SELECT has_schema_privilege('crm_migrator', 'public', 'CREATE') AS ok");
|
||||
if (!$canCreate || !$canCreate->ok) { DB::statement('RESET ROLE'); }
|
||||
} catch (\Throwable) {}
|
||||
DB::statement('DROP TABLE IF EXISTS <table> CASCADE');
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
- GRANT'ы: SaaS-level read-таблицы → `crm_readonly` + `crm_supplier_worker` SELECT; запись через `crm_migrator`. Tenant-таблицы → RLS policy + GRANT `crm_app_user`/`crm_supplier_worker` (образец snapshot-миграции строки 49-55).
|
||||
- Партиционированные таблицы: явный `CREATE TABLE ..._y2026_m05 PARTITION OF ...` для текущего+следующего месяца + регистрация retention в `system_settings` (образец строки 57-78).
|
||||
- **`db/schema.sql` + `db/CHANGELOG_schema.md`** обновлять при каждой схемной правке (правило §4.2 / §5 п.8 CLAUDE.md). Bump версии schema в header.
|
||||
|
||||
### Git / коммиты
|
||||
|
||||
- Ветка: `feat/lead-region-resolution` (создаётся в Session 1, см. Preconditions).
|
||||
- Частые атомарные коммиты (per task). Conventional commits: `feat(region):`, `test(region):`, `chore(region):`.
|
||||
- Каждая сессия завершается зелёной регрессией затронутого слоя + push.
|
||||
|
||||
---
|
||||
|
||||
## SESSION 1 — Схема БД + регистрация партиций
|
||||
|
||||
**Deliverable:** Все таблицы и колонки фичи существуют, миграция up/down работает, партиции регистрируются. Никакой бизнес-логики.
|
||||
**Preconditions:** Чистый `main` (или согласованная база). Создать ветку: `git switch -c feat/lead-region-resolution`. Закоммитить spec (untracked) первым коммитом.
|
||||
**Files:**
|
||||
|
||||
- Create: `app/database/migrations/2026_05_31_100000_create_phone_ranges_and_resolution_log.php`
|
||||
- Modify: `app/app/Services/MonthlyPartitionManager.php:48-62` (PARTITIONED_TABLES map)
|
||||
- Modify: `db/schema.sql` (новые таблицы + ALTER, bump версии) + `db/CHANGELOG_schema.md`
|
||||
- Test: `app/tests/Feature/Migrations/PhoneRangesMigrationTest.php`
|
||||
|
||||
### Task 1.1 — Failing test: миграция создаёт таблицы и колонки
|
||||
|
||||
- [ ] **Step 1: Написать падающий тест**
|
||||
|
||||
`app/tests/Feature/Migrations/PhoneRangesMigrationTest.php`:
|
||||
|
||||
```php
|
||||
<?php
|
||||
declare(strict_types=1);
|
||||
use Illuminate\Support\Facades\DB;
|
||||
use Tests\Concerns\SharesSupplierPdo;
|
||||
|
||||
uses(SharesSupplierPdo::class);
|
||||
|
||||
it('creates phone_ranges with lookup index', function (): void {
|
||||
expect(DB::selectOne("SELECT to_regclass('public.phone_ranges') AS t")->t)->not->toBeNull();
|
||||
$cols = collect(DB::select("SELECT column_name FROM information_schema.columns WHERE table_name='phone_ranges'"))
|
||||
->pluck('column_name')->all();
|
||||
expect($cols)->toContain('def_code', 'from_num', 'to_num', 'operator', 'region', 'subject_code', 'import_id');
|
||||
});
|
||||
|
||||
it('creates lead_region_resolution_log as partitioned table', function (): void {
|
||||
$p = DB::selectOne("SELECT partattrs FROM pg_partitioned_table pt JOIN pg_class c ON c.oid=pt.partrelid WHERE c.relname='lead_region_resolution_log'");
|
||||
expect($p)->not->toBeNull();
|
||||
});
|
||||
|
||||
it('adds resolution columns to supplier_leads and deals', function (): void {
|
||||
$sl = collect(DB::select("SELECT column_name FROM information_schema.columns WHERE table_name='supplier_leads'"))->pluck('column_name')->all();
|
||||
expect($sl)->toContain('resolved_subject_code', 'region_source', 'dadata_qc', 'phone_operator');
|
||||
$d = collect(DB::select("SELECT column_name FROM information_schema.columns WHERE table_name='deals'"))->pluck('column_name')->all();
|
||||
expect($d)->toContain('phone_operator', 'region_substituted');
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Прогнать — убедиться что падает** (`cd app && ./vendor/bin/pest tests/Feature/Migrations/PhoneRangesMigrationTest.php` → FAIL: relation does not exist)
|
||||
|
||||
- [ ] **Step 3: Написать миграцию.** DDL по спеке §4.1-§4.6 с поправками. Полный DDL (вставить в `DB::unprepared`):
|
||||
|
||||
```sql
|
||||
-- 1. phone_ranges_imports (журнал импортов — создаём ПЕРВЫМ, на него FK)
|
||||
CREATE TABLE phone_ranges_imports (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
imported_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
source_url TEXT NOT NULL,
|
||||
rows_inserted INTEGER NOT NULL DEFAULT 0,
|
||||
rows_updated INTEGER NOT NULL DEFAULT 0,
|
||||
checksum_sha256 TEXT NOT NULL,
|
||||
status TEXT NOT NULL DEFAULT 'in_progress'
|
||||
CHECK (status IN ('in_progress','completed','failed','rolled_back')),
|
||||
error TEXT,
|
||||
completed_at TIMESTAMPTZ
|
||||
);
|
||||
|
||||
-- 2. phone_ranges (реестр Россвязи, SaaS-level без RLS)
|
||||
CREATE TABLE phone_ranges (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
def_code SMALLINT NOT NULL,
|
||||
from_num BIGINT NOT NULL,
|
||||
to_num BIGINT NOT NULL,
|
||||
operator TEXT NOT NULL,
|
||||
region TEXT NOT NULL,
|
||||
region_normalized TEXT,
|
||||
subject_code SMALLINT,
|
||||
imported_at TIMESTAMPTZ NOT NULL,
|
||||
import_id BIGINT NOT NULL REFERENCES phone_ranges_imports(id),
|
||||
CONSTRAINT chk_phone_ranges_def_code CHECK (def_code BETWEEN 300 AND 999),
|
||||
CONSTRAINT chk_phone_ranges_subject_code CHECK (subject_code IS NULL OR subject_code BETWEEN 1 AND 89),
|
||||
CONSTRAINT chk_phone_ranges_range_valid CHECK (from_num <= to_num)
|
||||
);
|
||||
CREATE INDEX idx_phone_ranges_lookup ON phone_ranges (def_code, from_num, to_num);
|
||||
GRANT SELECT ON phone_ranges, phone_ranges_imports TO crm_readonly, crm_supplier_worker;
|
||||
|
||||
-- 3. lead_region_resolution_log (SaaS-level, партиционирован по received_at)
|
||||
CREATE TABLE lead_region_resolution_log (
|
||||
id BIGSERIAL,
|
||||
supplier_lead_id BIGINT NOT NULL,
|
||||
received_at TIMESTAMPTZ NOT NULL,
|
||||
phone_masked TEXT NOT NULL,
|
||||
subject_code_resolved SMALLINT,
|
||||
subject_code_from_tag SMALLINT,
|
||||
region_source TEXT NOT NULL CHECK (region_source IN ('dadata','rossvyaz','tag','unknown')),
|
||||
dadata_qc SMALLINT,
|
||||
dadata_provider TEXT,
|
||||
dadata_type TEXT,
|
||||
dadata_response_masked JSONB,
|
||||
rossvyaz_matched BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
actual_subject_code SMALLINT CHECK (actual_subject_code IS NULL OR actual_subject_code BETWEEN 1 AND 89),
|
||||
substituted_subject_code SMALLINT CHECK (substituted_subject_code IS NULL OR substituted_subject_code BETWEEN 1 AND 89),
|
||||
routing_step SMALLINT CHECK (routing_step IS NULL OR routing_step BETWEEN 1 AND 3),
|
||||
phone_operator TEXT,
|
||||
cache_hit BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
duration_ms INTEGER,
|
||||
resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
PRIMARY KEY (id, received_at)
|
||||
) PARTITION BY RANGE (received_at);
|
||||
CREATE INDEX idx_lrrl_lead_id ON lead_region_resolution_log (supplier_lead_id);
|
||||
CREATE INDEX idx_lrrl_source ON lead_region_resolution_log (region_source, received_at);
|
||||
GRANT SELECT, INSERT ON lead_region_resolution_log TO crm_supplier_worker;
|
||||
GRANT SELECT ON lead_region_resolution_log TO crm_readonly;
|
||||
CREATE TABLE lead_region_resolution_log_y2026_m05 PARTITION OF lead_region_resolution_log
|
||||
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');
|
||||
CREATE TABLE lead_region_resolution_log_y2026_m06 PARTITION OF lead_region_resolution_log
|
||||
FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');
|
||||
|
||||
-- 4. supplier_leads +4 колонки (persistent idempotency + denormalized display)
|
||||
ALTER TABLE supplier_leads
|
||||
ADD COLUMN resolved_subject_code SMALLINT CHECK (resolved_subject_code IS NULL OR resolved_subject_code BETWEEN 1 AND 89),
|
||||
ADD COLUMN region_source TEXT CHECK (region_source IN ('dadata','rossvyaz','tag','unknown')),
|
||||
ADD COLUMN dadata_qc SMALLINT,
|
||||
ADD COLUMN phone_operator TEXT;
|
||||
|
||||
-- 5. deals +2 колонки
|
||||
ALTER TABLE deals
|
||||
ADD COLUMN phone_operator TEXT,
|
||||
ADD COLUMN region_substituted BOOLEAN NOT NULL DEFAULT FALSE;
|
||||
```
|
||||
|
||||
В том же `up()` после `DB::unprepared`: зарегистрировать retention `lead_region_resolution_log` в `system_settings` (паттерн snapshot-миграции строки 67-78, `value => '12'`, 365 дней). `down()`: `DROP TABLE IF EXISTS lead_region_resolution_log, phone_ranges, phone_ranges_imports CASCADE` + `ALTER TABLE ... DROP COLUMN IF EXISTS ...` для supplier_leads/deals + удалить system_settings ключ.
|
||||
|
||||
> **Гайд по партициям:** новый партиционированный `lead_region_resolution_log` имеет ключ `received_at` (как `deals`). Партиции `deals` создаются помесячно — наши партиции на старте только m05/m06, дальше их подхватит `partitions:create-months` ПОСЛЕ регистрации в Task 1.2.
|
||||
|
||||
- [ ] **Step 4: Прогнать тест — PASS** (`cd app && ./vendor/bin/pest tests/Feature/Migrations/PhoneRangesMigrationTest.php`)
|
||||
|
||||
- [ ] **Step 5: Коммит** `git add -A && git commit -m "feat(region): schema — phone_ranges, resolution_log, supplier_leads/deals columns"`
|
||||
|
||||
### Task 1.2 — Регистрация новой партиц-таблицы в MonthlyPartitionManager
|
||||
|
||||
- [ ] **Step 1: Падающий тест** `app/tests/Unit/Services/MonthlyPartitionManagerRegionLogTest.php`:
|
||||
|
||||
```php
|
||||
<?php
|
||||
declare(strict_types=1);
|
||||
use App\Services\MonthlyPartitionManager;
|
||||
it('knows lead_region_resolution_log partition key', function (): void {
|
||||
expect(MonthlyPartitionManager::PARTITIONED_TABLES)->toHaveKey('lead_region_resolution_log');
|
||||
expect(MonthlyPartitionManager::PARTITIONED_TABLES['lead_region_resolution_log'])->toBe('received_at');
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Прогнать — FAIL.**
|
||||
- [ ] **Step 3: Добавить** в `MonthlyPartitionManager::PARTITIONED_TABLES` (после строки 61) `'lead_region_resolution_log' => 'received_at',`.
|
||||
- [ ] **Step 4: Прогнать — PASS.**
|
||||
- [ ] **Step 5: Коммит** `chore(region): register lead_region_resolution_log in MonthlyPartitionManager`.
|
||||
|
||||
### Task 1.3 — Синхронизация db/schema.sql + CHANGELOG
|
||||
|
||||
- [ ] **Step 1:** Добавить новые `CREATE TABLE`/`ALTER` в `db/schema.sql` (зеркало миграции), bump версии в header.
|
||||
- [ ] **Step 2:** Запись в `db/CHANGELOG_schema.md` (новая версия, перечень изменений).
|
||||
- [ ] **Step 3:** Коммит `chore(region): sync db/schema.sql + CHANGELOG for region resolution`.
|
||||
|
||||
**Session 1 завершение:** прогон `cd app && ./vendor/bin/pest tests/Feature/Migrations tests/Unit/Services/MonthlyPartitionManagerRegionLogTest.php` → GREEN. Push.
|
||||
|
||||
---
|
||||
|
||||
## SESSION 2 — Россвязь: реестр + lookup
|
||||
|
||||
**Deliverable:** `RossvyazPrefixLookup` находит регион+оператора по телефону через `phone_ranges`; `phone-ranges:import` команда импортирует реестр.
|
||||
**Preconditions:** Session 1 смержена/на ветке. Таблицы `phone_ranges*` существуют.
|
||||
**Files:**
|
||||
|
||||
- Create: `app/app/Services/RossvyazPrefixLookup.php`, `app/app/Services/Dto/RossvyazRecord.php`
|
||||
- Create: `app/app/Console/Commands/PhoneRangesImportCommand.php`
|
||||
- Test: `app/tests/Unit/Services/RossvyazPrefixLookupTest.php`, `app/tests/Feature/Console/PhoneRangesImportCommandTest.php`
|
||||
|
||||
### Task 2.1 — RossvyazRecord DTO + Lookup (TDD)
|
||||
|
||||
- [ ] **Step 1: Падающие тесты** `RossvyazPrefixLookupTest.php` (Feature, нужна БД — `uses(DatabaseTransactions::class, SharesSupplierPdo::class)`; сидируем `phone_ranges` напрямую через `DB::table`):
|
||||
|
||||
```php
|
||||
it('mobile prefix returns correct region and operator', function (): void {
|
||||
DB::table('phone_ranges')->insert([
|
||||
'def_code'=>921,'from_num'=>5550000,'to_num'=>5559999,'operator'=>'МегаФон',
|
||||
'region'=>'Санкт-Петербург','subject_code'=>83,'imported_at'=>now(),'import_id'=>seedImport(),
|
||||
]);
|
||||
$rec = app(App\Services\RossvyazPrefixLookup::class)->find('7921555XXXX');
|
||||
expect($rec)->not->toBeNull()->and($rec->subjectCode)->toBe(83)->and($rec->region)->toBe('Санкт-Петербург');
|
||||
});
|
||||
it('prefers narrower range when two ranges overlap', function (): void { /* два диапазона, узкий выигрывает (ORDER BY to_num-from_num ASC) */ });
|
||||
it('returns null for unknown prefix', function (): void {
|
||||
expect(app(App\Services\RossvyazPrefixLookup::class)->find('7999XXXXXXX'))->toBeNull();
|
||||
});
|
||||
```
|
||||
|
||||
(`seedImport()` — локальный хелпер в тесте: вставляет строку `phone_ranges_imports` и возвращает id.)
|
||||
|
||||
- [ ] **Step 2: FAIL.**
|
||||
- [ ] **Step 3: Реализация.** `RossvyazRecord` — readonly DTO (`subjectCode: ?int`, `region: string`, `operator: string`). `RossvyazPrefixLookup::find(string $phone): ?RossvyazRecord` по алгоритму спеки §3.7: `def_code = (int) substr($phone,1,3)`, `subscriber = (int) substr($phone,4)`, SQL `SELECT region, operator, subject_code FROM phone_ranges WHERE def_code=? AND from_num<=? AND to_num>=? ORDER BY (to_num-from_num) ASC LIMIT 1`. Запрос через `DB::connection('pgsql_supplier')` (BYPASSRLS, как LeadRouter).
|
||||
- [ ] **Step 4: PASS.**
|
||||
- [ ] **Step 5: Коммит** `feat(region): RossvyazPrefixLookup + RossvyazRecord DTO`.
|
||||
|
||||
### Task 2.2 — PhoneRangesImportCommand (TDD)
|
||||
|
||||
- [ ] **Step 1: Падающий Feature-тест** — `phone-ranges:import --dry-run` парсит фикстурный XLSX/CSV в `phone_ranges_staging`, маппит region→subject_code через `RussianRegions::nameToCode()`, при `--dry-run` не свапает. (Фикстура: маленький CSV в `app/tests/Fixtures/rossvyaz/sample.csv`.)
|
||||
- [ ] **Step 2: FAIL.**
|
||||
- [ ] **Step 3: Реализация** по спеке §6.2: staging-таблица → COPY → checksum-idempotency → atomic `RENAME` swap → `phone_ranges_imports.status`. Несматчившиеся регионы → лог в `phone_ranges_imports.error`. `--dry-run` останавливается до swap. **NB:** реальный источник — пакет ~500-600 файлов XLSX (§6.1); для теста парсим один CSV-фикстуру. Парсер XLSX — отдельный приватный метод, в тесте подменяется CSV-веткой через флаг формата.
|
||||
- [ ] **Step 4: PASS.**
|
||||
- [ ] **Step 5: Коммит** `feat(region): phone-ranges:import command with atomic swap + idempotency`.
|
||||
|
||||
**Session 2 завершение:** GREEN сервис-слой Россвязи. Push. (Реальный первый импорт реестра — оператором в Session 6 раскатке, не в тесте.)
|
||||
|
||||
---
|
||||
|
||||
## SESSION 3 — DaData клиент + бюджет + rate-limit + region map
|
||||
|
||||
**Deliverable:** `DaDataPhoneClient` дёргает REST, `DaDataRegionMap` маппит имя→код, `DaDataBudgetGuard` режет по дневному лимиту, token-bucket защищает от 429. Никакой оркестрации (она в Session 4).
|
||||
**Preconditions:** Sessions 1-2 готовы.
|
||||
**Files:**
|
||||
|
||||
- Create: `app/app/Services/DaData/DaDataPhoneClient.php`, `DaDataPhoneResponse.php`, `DaDataQualityCode.php`, `DaDataException.php`, `DaDataTimeoutException.php`
|
||||
- Create: `app/app/Services/DaData/DaDataBudgetGuard.php`
|
||||
- Create: `app/app/Support/DaDataRegionMap.php`
|
||||
- Modify: `app/config/services.php` (+`dadata` блок)
|
||||
- Test: `app/tests/Unit/Services/DaData/DaDataPhoneClientTest.php`, `DaDataBudgetGuardTest.php`, `app/tests/Unit/Support/DaDataRegionMapTest.php`
|
||||
|
||||
### Task 3.1 — config/services.php + DaDataQualityCode enum
|
||||
|
||||
- [ ] **Step 1:** Добавить в `config/services.php`:
|
||||
|
||||
```php
|
||||
'dadata' => [
|
||||
'api_key' => env('DADATA_API_KEY'),
|
||||
'secret' => env('DADATA_SECRET'),
|
||||
'timeout_ms' => (int) env('DADATA_TIMEOUT_MS', 2000),
|
||||
'retries' => (int) env('DADATA_RETRIES', 1),
|
||||
'daily_cap_rub' => (int) env('DADATA_DAILY_CAP_RUB', 10000),
|
||||
'enabled' => filter_var(env('LEAD_REGION_RESOLVER_ENABLED', false), FILTER_VALIDATE_BOOL),
|
||||
'cache_ttl_days' => (int) env('PHONE_REGION_CACHE_TTL_DAYS', 30),
|
||||
],
|
||||
```
|
||||
|
||||
- [ ] **Step 2:** `DaDataQualityCode` — enum:int (CASE_RECOGNIZED=0, ASSUMPTIONS=1, EMPTY=2, MULTIPLE=3, FOREIGN=7). Без теста (тривиальный enum) — покрывается через клиент.
|
||||
- [ ] **Step 3: Коммит** `chore(region): config/services dadata + DaDataQualityCode enum`.
|
||||
|
||||
### Task 3.2 — DaDataRegionMap (TDD)
|
||||
|
||||
- [ ] **Step 1: Падающий unit-тест** `DaDataRegionMapTest.php`:
|
||||
|
||||
```php
|
||||
use App\Support\DaDataRegionMap;
|
||||
it('maps exact official names via RussianRegions', function (): void {
|
||||
expect(DaDataRegionMap::toSubjectCode('Москва'))->toBe(82);
|
||||
expect(DaDataRegionMap::toSubjectCode('Московская область'))->toBe(56);
|
||||
expect(DaDataRegionMap::toSubjectCode('Санкт-Петербург'))->toBe(83);
|
||||
expect(DaDataRegionMap::toSubjectCode('Ленинградская область'))->toBe(53);
|
||||
});
|
||||
it('flags ambiguous agglomeration strings', function (): void {
|
||||
expect(DaDataRegionMap::isAmbiguous('Санкт-Петербург и область'))->toBeTrue();
|
||||
expect(DaDataRegionMap::isAmbiguous('Москва и область'))->toBeTrue();
|
||||
expect(DaDataRegionMap::isAmbiguous('Москва'))->toBeFalse();
|
||||
});
|
||||
it('returns null for unmappable region', function (): void {
|
||||
expect(DaDataRegionMap::toSubjectCode('Атлантида'))->toBeNull();
|
||||
});
|
||||
it('resolves all 89 RussianRegions names', function (): void {
|
||||
foreach (App\Support\RussianRegions::CODE_TO_NAME as $code => $name) {
|
||||
expect(DaDataRegionMap::toSubjectCode($name))->toBe($code);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: FAIL.**
|
||||
- [ ] **Step 3: Реализация.** `DaDataRegionMap`: `AMBIGUOUS_REGIONS = ['Санкт-Петербург и область','Москва и область']` (const). `OVERRIDES` — массив для несовпадающих имён (на старте пустой — заполняется findings). `toSubjectCode(string $name): ?int` → trim → `OVERRIDES[$name] ?? RussianRegions::nameToCode()[$name] ?? null`. `isAmbiguous(string $name): bool` → `in_array($name, self::AMBIGUOUS_REGIONS, true)`.
|
||||
- [ ] **Step 4: PASS.**
|
||||
- [ ] **Step 5: Коммит** `feat(region): DaDataRegionMap with ambiguous-list + 89-region coverage`.
|
||||
|
||||
### Task 3.3 — DaDataPhoneClient (TDD, Http::fake)
|
||||
|
||||
> **Конвенция HTTP-клиента** — зеркалить [`app/app/Services/Supplier/SupplierPortalClient.php`](../../../app/app/Services/Supplier/SupplierPortalClient.php): инжектить `Illuminate\Http\Client\Factory $http`, кастомные исключения, приватный `request()`.
|
||||
|
||||
- [ ] **Step 1: Падающие unit-тесты** `DaDataPhoneClientTest.php` (по одному на qc 0/1/2/3/7 + timeout + 5xx-retry + 4xx-no-retry). Пример:
|
||||
|
||||
```php
|
||||
use App\Services\DaData\DaDataPhoneClient;
|
||||
use Illuminate\Support\Facades\Http;
|
||||
it('parses qc=0 mobile response', function (): void {
|
||||
Http::fake(['cleaner.dadata.ru/*' => Http::response([[
|
||||
'qc'=>0,'qc_conflict'=>0,'type'=>'Мобильный','phone'=>'+7 921 555-12-34',
|
||||
'provider'=>'МегаФон','region'=>'Санкт-Петербург и область','timezone'=>'UTC+3',
|
||||
]], 200)]);
|
||||
$resp = app(DaDataPhoneClient::class)->cleanPhone('7921555XXXX');
|
||||
expect($resp->qc)->toBe(0)->and($resp->provider)->toBe('МегаФон')
|
||||
->and($resp->region)->toBe('Санкт-Петербург и область');
|
||||
});
|
||||
it('throws DaDataTimeoutException on connection timeout', function (): void {
|
||||
Http::fake(fn () => throw new Illuminate\Http\Client\ConnectionException('timeout'));
|
||||
expect(fn () => app(DaDataPhoneClient::class)->cleanPhone('7921555XXXX'))
|
||||
->toThrow(App\Services\DaData\DaDataTimeoutException::class);
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: FAIL.**
|
||||
- [ ] **Step 3: Реализация** по §3.6: POST `https://cleaner.dadata.ru/api/v1/clean/phone`, headers `Authorization: Token <key>`, `X-Secret: <secret>`, body `["<phone>"]`, timeout из config, retry на сетевые/5xx. Парсинг массива[0] → `DaDataPhoneResponse` (readonly DTO, поля по §3.6). `ConnectionException`/таймаут → `DaDataTimeoutException`; не-2xx после retry → `DaDataException`.
|
||||
- [ ] **Step 4: PASS.**
|
||||
- [ ] **Step 5: Коммит** `feat(region): DaDataPhoneClient + DTO + exceptions`.
|
||||
|
||||
### Task 3.4 — DaDataBudgetGuard + token-bucket (TDD)
|
||||
|
||||
- [ ] **Step 1: Падающий тест** — `canSpend()` true пока `phone_resolution.dadata.spent_today_kopecks < daily_cap`; false при превышении; `recordSpend()` делает Redis INCRBY. (`Cache::store('array')` или Redis-fake.)
|
||||
- [ ] **Step 2: FAIL.**
|
||||
- [ ] **Step 3: Реализация** §5.3 + §3.13: `DaDataBudgetGuard` (canSpend/recordSpend через Redis-ключ с дневным TTL). Token-bucket 18 RPS — `RateLimiter::for('dadata-cleaner', ...)` зарегистрировать в провайдере; в клиенте обернуть вызов (или отдельный guard — решить в Session 4 при сборке).
|
||||
- [ ] **Step 4: PASS.**
|
||||
- [ ] **Step 5: Коммит** `feat(region): DaDataBudgetGuard + rate-limit`.
|
||||
|
||||
**Session 3 завершение:** GREEN `tests/Unit/Services/DaData tests/Unit/Support/DaDataRegionMapTest.php`. Push.
|
||||
|
||||
---
|
||||
|
||||
## SESSION 4 — LeadRegionResolver (оркестратор)
|
||||
|
||||
**Deliverable:** `LeadRegionResolver::resolve(SupplierLead): RegionResolution` со всем каскадом qc-решений, кэшем, ambiguous-логикой, persistent-idempotency, cache-hit логированием. Это сердце фичи.
|
||||
**Preconditions:** Sessions 1-3. Все суб-компоненты существуют и зелёные.
|
||||
**Files:**
|
||||
|
||||
- Create: `app/app/Services/LeadRegionResolver.php`, `app/app/Services/Dto/RegionResolution.php`
|
||||
- Test: `app/tests/Unit/Services/LeadRegionResolverTest.php` (12 кейсов из спеки §9.1)
|
||||
|
||||
### Task 4.1 — RegionResolution DTO + source rank
|
||||
|
||||
- [ ] **Step 1: Падающий тест** на DTO: поля `subjectCode: ?int`, `actualSubjectCode: ?int`, `source: string` ('dadata'|'rossvyaz'|'tag'|'unknown'), `phoneOperator: ?string`, `qc: ?int`, `cacheHit: bool`, `dadataResponseMasked: ?array`, `durationMs: ?int`, `rossvyazMatched: bool`. + статик `SOURCE_RANK` const `['dadata'=>4,'rossvyaz'=>3,'tag'=>2,'unknown'=>1]`. + фабрики `fromTag()`, `fromSupplierLead()` (для persistent-idempotency).
|
||||
- [ ] **Step 2-4:** реализация readonly DTO, PASS.
|
||||
- [ ] **Step 5: Коммит** `feat(region): RegionResolution DTO + SOURCE_RANK`.
|
||||
|
||||
### Task 4.2 — LeadRegionResolver: 12 кейсов (TDD, по одному тесту за раз)
|
||||
|
||||
Реализация по алгоритму спеки §3.3 + §3.4 (decision-таблица). Кэш-ключ `sha256("phone-region:".$phone)`, TTL = `config('services.dadata.cache_ttl_days')` дней. Persistent-idempotency: в начале `resolve()` если `$lead->resolved_subject_code !== null || $lead->region_source !== null` → `RegionResolution::fromSupplierLead($lead)` без DaData. Валидация телефона `/^7\d{10}$/` (как в Job/Controller).
|
||||
|
||||
Каждый тест из списка спеки §9.1 — отдельный TDD-цикл (Step write→fail→implement→pass→commit). Имена тестов (Pest `it('...')`):
|
||||
|
||||
- [ ] `dadata qc 0 returns dadata source` — `Http::fake` qc=0 region не-ambiguous → source='dadata', subjectCode маппится.
|
||||
- [ ] `dadata qc 0 ambiguous region falls to rossvyaz but keeps dadata provider` — region='Санкт-Петербург и область' → идём в Россвязь за subjectCode=83, provider остаётся от DaData (И-2). **Ключевой тест ambiguous-логики.**
|
||||
- [ ] `dadata qc 3 returns dadata with multiple flag`.
|
||||
- [ ] `dadata qc 1 falls back to rossvyaz`.
|
||||
- [ ] `dadata qc 2 falls back to tag skipping rossvyaz`.
|
||||
- [ ] `dadata qc 7 falls back to tag skipping rossvyaz`.
|
||||
- [ ] `dadata timeout falls back to rossvyaz`.
|
||||
- [ ] `dadata network error falls back to rossvyaz`.
|
||||
- [ ] `budget cap exceeded skips dadata directly to rossvyaz` (`DaDataBudgetGuard::canSpend()` false).
|
||||
- [ ] `cache hit skips dadata and rossvyaz` — второй вызов того же телефона не дёргает Http (assert `Http::assertSentCount`).
|
||||
- [ ] `invalid phone skips dadata returns tag`.
|
||||
- [ ] `qc 0 region null falls through to rossvyaz` (мобильный без региона, §3.4 Q6/Q7).
|
||||
- [ ] `unmappable dadata region falls through to rossvyaz` (qc=0 но region не в справочнике).
|
||||
- [ ] `all three layers fail returns unknown with null subject_code`.
|
||||
|
||||
После каждого — Step «commit» `feat(region): LeadRegionResolver — <case>` (или батч-коммит на 3-4 связанных кейса).
|
||||
|
||||
**Session 4 завершение:** `cd app && ./vendor/bin/pest tests/Unit/Services/LeadRegionResolverTest.php` все GREEN. Push. **Это самая важная сессия — не торопиться, ревью каждого кейса.**
|
||||
|
||||
---
|
||||
|
||||
## SESSION 5 — LeadRouter каскад + подмена региона
|
||||
|
||||
**Deliverable:** `LeadRouter::matchEligibleProjects` принимает `?int $resolvedSubjectCode`, фильтрует в 3 фазы (точное→РФ→запасной) для ОБОИХ путей (DIRECT + pivot), отдаёт ≤3 кандидата с атрибутом `routing_step`.
|
||||
**Preconditions:** Sessions 1-4. **Решён вопрос D1** (random→deterministic подтверждён заказчиком).
|
||||
**Files:**
|
||||
|
||||
- Modify: `app/app/Services/LeadRouter.php` (новый параметр + queryCandidates 3-фазы)
|
||||
- Modify: `app/tests/Pest.php` (расширить `createRoutingSnapshotFromProject` параметром `string $regions = '{}'`)
|
||||
- Test: `app/tests/Feature/Services/LeadRouterCascadeTest.php`
|
||||
|
||||
### Task 5.1 — Расширить тест-хелпер
|
||||
|
||||
- [ ] **Step 1:** В `createRoutingSnapshotFromProject` (Pest.php строки 128-150) добавить параметр `string $regions = '{}'` и подставить в insert вместо хардкода `'{}'` (строка 141). Существующие вызовы не ломаются (дефолт сохранён).
|
||||
- [ ] **Step 2:** Прогнать существующий `LeadRouterTest.php` — GREEN (регресс не сломан).
|
||||
- [ ] **Step 3: Коммит** `test(region): createRoutingSnapshotFromProject accepts regions param`.
|
||||
|
||||
### Task 5.2 — Каскад: сигнатура + 3 фазы (TDD)
|
||||
|
||||
> **Подход:** обернуть существующий SQL приватным `queryCandidates(string $activeDate, SupplierProject $sp, string $regionFilter, ?int $code, array $excludeTenantIds, int $limit): Collection`. Он содержит развилку DIRECT vs pivot (как сейчас) + добавляет WHERE-фрагмент по фильтру. `matchEligibleProjects(SupplierProject $sp, ?int $resolvedSubjectCode = null)` оркестрирует 3 фазы (§3.9 псевдокод), проставляет `routing_step` на каждый Project через `$project->setAttribute('routing_step', N)`.
|
||||
|
||||
WHERE-фрагменты:
|
||||
|
||||
- `exact`: `AND ?::int = ANY(snap.regions)` (bind `$code`)
|
||||
- `all_ru`: `AND snap.regions = '{}'::int[]`
|
||||
- `any`: без региона-фильтра (текущее поведение)
|
||||
|
||||
- [ ] **Step 1: Падающие тесты** `LeadRouterCascadeTest.php` (Pest, `DatabaseTransactions` + `SharesSupplierPdo`, tenant-context '0'):
|
||||
|
||||
```php
|
||||
it('step 1: exact region match wins', function (): void {
|
||||
$sp = SupplierProject::query()->create(['platform'=>'B1','signal_type'=>'site','unique_key'=>'ex.ru','subject_code'=>82,'current_limit'=>0,'sync_status'=>'ok']);
|
||||
// tenant A — регион 83 (СПб); tenant B — регион 82 (Москва)
|
||||
$a = makeLinkedProject($sp, regions: '{83}'); // helper inline
|
||||
$b = makeLinkedProject($sp, regions: '{82}');
|
||||
$matched = app(LeadRouter::class)->matchEligibleProjects($sp, resolvedSubjectCode: 82);
|
||||
expect($matched->pluck('id')->all())->toBe([$b->id]) // только Москва-проект
|
||||
->and($matched->first()->routing_step)->toBe(1);
|
||||
});
|
||||
it('step 2: falls to all-RF when no exact match', function (): void {
|
||||
// кандидат только с regions='{}' → routing_step=2 для resolvedSubjectCode=82
|
||||
});
|
||||
it('step 3: fallback channel when nobody subscribed to region', function (): void {
|
||||
// кандидат с regions='{83}' только; resolvedSubjectCode=82 → никто не подписан, нет РФ →
|
||||
// возвращается с routing_step=3 (подмена в Job, не здесь)
|
||||
});
|
||||
it('exact + all-RF combine up to cap=3', function (): void { /* 2 точных + 2 РФ → 3 взяты, точные первыми */ });
|
||||
it('null resolvedSubjectCode skips exact, uses all-RF then fallback', function (): void { /* резолвер не сработал */ });
|
||||
it('cascade works for DIRECT supplier_project path too', function (): void { /* platform=DIRECT */ });
|
||||
```
|
||||
|
||||
(`makeLinkedProject($sp, regions)` — inline-хелпер в файле теста: создаёт tenant с балансом, project, `linkProjectToSupplier`, `createRoutingSnapshotFromProject($p, regions: $regions)`.)
|
||||
|
||||
- [ ] **Step 2: FAIL.**
|
||||
- [ ] **Step 3: Реализация** каскада. Сохранить fail-loud `logIfNoSnapshot` (вызывать на финальном результате). `excludeTenantIds` для шага 2 = tenant_id из шага 1.
|
||||
- [ ] **Step 4: PASS** + регресс `LeadRouterTest.php` GREEN (старые вызовы без 2-го параметра используют дефолт `null` → ведут себя как «any», но теперь через каскад → проверить что 0-региональные тесты не сломались; при необходимости старые snapshot'ы имеют `regions='{}'` → попадают в шаг 2 all_ru).
|
||||
|
||||
> **⚠️ Регрессионный риск:** существующие `LeadRouterTest` создают snapshot с `regions='{}'` и вызывают `matchEligibleProjects($sp)` без 2-го арг. С каскадом `resolvedSubjectCode=null` → шаг 1 пропускается → шаг 2 all_ru матчит `regions='{}'` → те же результаты. **Проверить это явно**; если расходится — поправить дефолтную ветку, чтобы `null` + любой regions вёл себя как старое «any» (backward-compat). Это решение зафиксировать в коммит-сообщении.
|
||||
|
||||
- [ ] **Step 5: Коммит** `feat(region): LeadRouter cascade routing (exact→all-RF→fallback) with routing_step`.
|
||||
|
||||
**Session 5 завершение:** `cd app && ./vendor/bin/pest tests/Feature/Services/LeadRouterTest.php tests/Feature/Services/LeadRouterCascadeTest.php` GREEN. Push.
|
||||
|
||||
---
|
||||
|
||||
## SESSION 6 — Интеграция в Job + CSV-merge + flag + раскатка
|
||||
|
||||
**Deliverable:** `RouteSupplierLeadJob` использует `LeadRegionResolver`, персистит резолв, передаёт `routing_step`, подменяет регион на шаге 3; CSV-merge обновляет по рангу источника; feature-flag; метрики; staging-smoke.
|
||||
**Preconditions:** Sessions 1-5 все зелёные и смержены.
|
||||
**Files:**
|
||||
|
||||
- Modify: `app/app/Jobs/RouteSupplierLeadJob.php` (handle + createDealCopyForProject + CSV-merge)
|
||||
- Create: `app/app/Console/Commands/PhoneRegionSmokeCommand.php` (staging-smoke §9.4)
|
||||
- Test: `app/tests/Feature/Jobs/RouteSupplierLeadJobRegionResolutionTest.php`
|
||||
|
||||
### Task 6.1 — Резолв до транзакции + persist (TDD)
|
||||
|
||||
> **Точка вставки** ([RouteSupplierLeadJob.php:151-160](../../../app/app/Jobs/RouteSupplierLeadJob.php#L151)). Сейчас: `$matched = $router->matchEligibleProjects($supplier); $selected = $distributor->selectRecipients($matched); $subjectCode = $tagResolver->resolve(...)`. Становится: резолв региона ДО `matchEligibleProjects`, persist в одной короткой `DB::transaction()`, затем `matchEligibleProjects($supplier, $resolution->subjectCode)`.
|
||||
|
||||
- [ ] **Step 1: Падающий тест** `RouteSupplierLeadJobRegionResolutionTest.php`:
|
||||
|
||||
```php
|
||||
it('lead with phone uses dadata region not tag', function (): void {
|
||||
Http::fake(['cleaner.dadata.ru/*' => Http::response([['qc'=>0,'type'=>'Мобильный','provider'=>'МТС','region'=>'Москва']], 200)]);
|
||||
// lead с raw_payload tag='Санкт-Петербург' но phone резолвится в Москву(82)
|
||||
// → deal.subject_code = 82, supplier_leads.resolved_subject_code=82, region_source='dadata'
|
||||
// → строка в lead_region_resolution_log
|
||||
});
|
||||
it('region resolution logged per lead with cache_hit flag', function (): void { /* 1 строка в log */ });
|
||||
it('lead with invalid phone falls back to tag', function (): void { /* phone='123' → region_source='tag' */ });
|
||||
it('lead with resolver disabled via flag uses tag', function (): void { /* config dadata.enabled=false → tag-flow */ });
|
||||
it('persistent idempotency: retry does not re-call dadata', function (): void { /* resolved_subject_code уже set → Http::assertNothingSent */ });
|
||||
```
|
||||
|
||||
- [ ] **Step 2: FAIL.**
|
||||
- [ ] **Step 3: Реализация.** Инжектить `LeadRegionResolver $regionResolver` в `handle()`. После `$lead->update(['supplier_project_id'...])`:
|
||||
|
||||
```php
|
||||
$resolution = $regionResolver->resolve($lead);
|
||||
// persist в одной короткой транзакции (ДО циклов по проектам — HTTP не висит в tenant-tx)
|
||||
DB::transaction(function () use ($lead, $resolution): void {
|
||||
$lead->update([
|
||||
'resolved_subject_code' => $resolution->subjectCode,
|
||||
'region_source' => $resolution->source,
|
||||
'dadata_qc' => $resolution->qc,
|
||||
'phone_operator' => $resolution->phoneOperator,
|
||||
]);
|
||||
$this->logRegionResolution($lead, $resolution); // INSERT lead_region_resolution_log
|
||||
});
|
||||
$matched = $router->matchEligibleProjects($supplier, $resolution->subjectCode);
|
||||
$selected = $distributor->selectRecipients($matched);
|
||||
```
|
||||
|
||||
Удалить старый `$subjectCode = $tagResolver->resolve(...)`. `RegionTagResolver` остаётся injected (его использует `LeadRegionResolver` как fallback — DI цепочка). Приватный `logRegionResolution()` пишет в `lead_region_resolution_log` через `pgsql_supplier`, телефон маскируется (§7.1: `7XXX***YYYY`).
|
||||
|
||||
- [ ] **Step 4: PASS.**
|
||||
- [ ] **Step 5: Коммит** `feat(region): wire LeadRegionResolver into RouteSupplierLeadJob + persist`.
|
||||
|
||||
### Task 6.2 — Подмена subject_code на шаге 3 (TDD)
|
||||
|
||||
- [ ] **Step 1: Падающий тест** — `routing_step=3` проект получает deal с `subject_code` = первый из `project->regions`, `region_substituted=true`; `lead_region_resolution_log.actual_subject_code` = настоящий резолв. `routing_step<3` → настоящий subjectCode, `region_substituted=false`.
|
||||
- [ ] **Step 2: FAIL.**
|
||||
- [ ] **Step 3: Реализация** §3.10. `createDealCopyForProject` получает `RegionResolution $resolution` (вместо `?int $subjectCode`). Внутри:
|
||||
|
||||
```php
|
||||
$dealSubjectCode = ($project->routing_step ?? 1) < 3
|
||||
? $resolution->subjectCode
|
||||
: $this->pickSubstituteRegion($project, $resolution->subjectCode);
|
||||
$dealRegionSubstituted = ($project->routing_step ?? 1) === 3;
|
||||
// Deal::create([... 'subject_code'=>$dealSubjectCode, 'phone_operator'=>$resolution->phoneOperator, 'region_substituted'=>$dealRegionSubstituted])
|
||||
```
|
||||
|
||||
`pickSubstituteRegion(Project $p, ?int $resolved): ?int` — пустой `$p->regions` → `$resolved`; иначе `$p->regions[0]`. Дописать `lead_region_resolution_log` UPDATE с `routing_step`/`actual_subject_code`/`substituted_subject_code` (или включить в Task 6.1 лог — решить при сборке, лог пишется ПОСЛЕ маршрутизации когда routing_step известен; возможно перенести запись лога из 6.1 в конец handle()).
|
||||
|
||||
> **NB порядок записи лога:** `routing_step` известен только ПОСЛЕ `matchEligibleProjects`. Значит INSERT в `lead_region_resolution_log` логичнее делать ПОСЛЕ цикла (с агрегатом routing_step) ИЛИ писать базовую строку в 6.1 и UPDATE'ить routing-поля после. Выбрать: **одна строка на лид** пишется в конце `handle()` с финальными routing-полями (subject_code лида один, routing_step берётся от первого selected-проекта или max). Зафиксировать решение в коммите.
|
||||
|
||||
- [ ] **Step 4: PASS.**
|
||||
- [ ] **Step 5: Коммит** `feat(region): step-3 fallback subject_code substitution + region_substituted`.
|
||||
|
||||
### Task 6.3 — CSV-merge update по рангу источника (TDD)
|
||||
|
||||
- [ ] **Step 1: Падающий тест** — CSV-recovered deal `region_source='tag'`, subject_code=99; webhook даёт `dadata` subject=82 → merge обновляет subject_code/phone_operator/region_source (rank 4>2). Равный/худший ранг → НЕ обновляет.
|
||||
- [ ] **Step 2: FAIL.**
|
||||
- [ ] **Step 3: Реализация** §3.12 в merge-блоке (строки 340-369). При наличии `$existingMergeable` и нового `$resolution`: сравнить `RegionResolution::SOURCE_RANK`, если новый выше — добавить `subject_code`/`phone_operator`/`region_source` в `DB::table('deals')->where('id')->where('received_at')->update([...])`. **Сохранить `received_at` в WHERE** (partition pruning + FK, как в существующем коде, строки 357-360).
|
||||
- [ ] **Step 4: PASS.**
|
||||
- [ ] **Step 5: Коммит** `feat(region): CSV-merge updates subject_code/operator by source rank`.
|
||||
|
||||
### Task 6.4 — Staging-smoke команда + метрики
|
||||
|
||||
- [ ] **Step 1:** `PhoneRegionSmokeCommand` (`phone-region:smoke --phone=...`) §9.4 — дёргает живой DaData+Россвязь, печатает решение, НЕ пишет в БД. Тест: команда с `Http::fake` печатает структуру.
|
||||
- [ ] **Step 2:** Метрики §8.1 — инкременты `phone_resolution.source.*` / `dadata.qc.*` / `cache.{hit,miss}` через существующий механизм метрик проекта (проверить как проект шлёт в Sentry/Prometheus — grep `metric`/`Sentry::` в `app/app/Services`). Если механизма нет — отложить в отдельную задачу, отметить в коммите.
|
||||
- [ ] **Step 3: Коммит** `feat(region): staging smoke command + resolution metrics`.
|
||||
|
||||
### Task 6.5 — Регрессия + handoff раскатки
|
||||
|
||||
- [ ] **Step 1:** Полная регрессия затронутого слоя: `cd app && ./vendor/bin/pest tests/Unit/Services tests/Feature/Services tests/Feature/Jobs tests/Feature/Migrations`. GREEN.
|
||||
- [ ] **Step 2:** `superpowers:requesting-code-review` на весь диапазон фичи.
|
||||
- [ ] **Step 3:** Документ-handoff раскатки (§10): порядок прод-шагов (миграция → импорт реестра → деплой с `LEAD_REGION_RESOLVER_ENABLED=false` → 1% → 100%), включая `DADATA_API_KEY`/`DADATA_SECRET` в YC Lockbox. Файл: `docs/superpowers/runbooks/2026-05-31-lead-region-resolution-rollout.md`.
|
||||
- [ ] **Step 4: Финальный коммит + PR.** `superpowers:finishing-a-development-branch`.
|
||||
|
||||
**Session 6 завершение:** вся фича зелёная, code-review пройден, runbook готов. Фактический первый импорт реестра Россвязи + раскатка — оператором по runbook, ВНЕ этого плана.
|
||||
|
||||
---
|
||||
|
||||
## Self-Review (выполнено автором плана)
|
||||
|
||||
**Spec coverage:** §3.3 резолвер→Session 4; §3.4/§3.4.1 qc+ambiguous→Session 4; §3.7 Россвязь→Session 2; §3.6 DaData→Session 3; §3.9 каскад→Session 5; §3.10 подмена→Session 6.2; §3.11 persist/idempotency→Session 6.1; §3.12 CSV-merge→Session 6.3; §3.13 rate-limit→Session 3.4; §4 схема→Session 1; §5 config→Session 3.1; §6 импорт→Session 2.2; §8 метрики→Session 6.4; §9 тесты→распределены; §11 бюджет→config+guard Session 3. **Gap:** §7 (152-ФЗ маскирование) — покрыто частично (phone_masked в логе, Session 6.1); pg_anonymizer-маски (§7.2) НЕ выделены в задачу → **добавить в Session 1 Task 1.3 как комментарий схемы ИЛИ отдельную задачу раскатки** (low-risk, отметить для заказчика).
|
||||
|
||||
**Type consistency:** `RegionResolution` поля (`subjectCode`/`source`/`phoneOperator`/`qc`/`actualSubjectCode`) согласованы между Session 4 (определение), Session 5 (роутер не зависит от DTO), Session 6 (потребитель). `routing_step` — атрибут на `Project` (Session 5 пишет, Session 6 читает). `SOURCE_RANK` — один источник в `RegionResolution` (Session 4), потребляется в Session 6.3.
|
||||
|
||||
**Placeholders:** DDL, сигнатуры, имена тестов, точка интеграции — конкретны. Полные TDD-шаги для рутинных тестов внутри Session 4/6 описаны именами кейсов + поведением; при subagent-driven-development каждый кейс разворачивается исполнителем в write→fail→implement→pass (имена и ожидаемое поведение заданы точно).
|
||||
|
||||
---
|
||||
|
||||
## Порядок выполнения и ветки
|
||||
|
||||
1. Все 6 сессий — на одной ветке `feat/lead-region-resolution`, последовательно.
|
||||
2. Каждая сессия = отдельный subagent-driven-development прогон с ревью между задачами (Pravila §15.1 — субагенты git только Sonnet/Opus, верификация commit-базы после каждого).
|
||||
3. Между сессиями — пауза/чекпойнт заказчику (можно разнести по календарным дням).
|
||||
4. Изоляция от параллельных сессий: если router-gate v4 streams ещё активны — работать в worktree (`superpowers:using-git-worktrees`), мерж в main отдельным чекпойнтом.
|
||||
@@ -0,0 +1,459 @@
|
||||
# Safe-baseline live wiring Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Make `enforce-safe-baseline-metering.mjs` a live PreToolUse hook that hard-blocks a mutating tool past a per-task safe-baseline threshold without a real skill match, with an always-available Skill/EnterPlanMode escape; plus a standalone `enforce-runtime-write-deny` hook that closes the self-write hole on `~/.claude/runtime` side-channels.
|
||||
|
||||
**Architecture:** All logic in pure functions; `main()` is I/O composition only. The pure metering core (`safe-baseline-metering.mjs`) is reused unchanged; new pure helpers (`extractKeywords`, `detectSkillMatch`, `runLiveDecision`) live in the wrapper. The stickiness contract (V2-1) is owned by `runLiveDecision`. The write-deny hook normalizes with the resolving `pathNormalize` (V2-2). Override subsystem is cut (G3).
|
||||
|
||||
**Tech Stack:** Node.js ESM (`.mjs`), vitest, existing helpers (`enforce-hook-helpers.mjs`, `safe-baseline-metering.mjs`, `path-normalization.mjs`).
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md` (v4).
|
||||
|
||||
**NB (overnight autonomous run):** git commits require owner AskUserQuestion approval (gate) — not available while the owner sleeps. Implement on disk, keep `npm run test:tools` GREEN, leave commits + settings.json registration for the morning handoff.
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
| Path | Responsibility |
|
||||
|---|---|
|
||||
| `tools/enforce-safe-baseline-metering.mjs` (modify) | + `extractKeywords`, `detectSkillMatch`, `runLiveDecision`, live `main()` |
|
||||
| `tools/enforce-safe-baseline-metering.test.mjs` (modify) | + tests for the three new pure functions |
|
||||
| `tools/enforce-runtime-write-deny.mjs` (create) | standalone PreToolUse write-deny on `~/.claude/runtime/**` |
|
||||
| `tools/enforce-runtime-write-deny.test.mjs` (create) | unit tests incl. V2-2 `.`-segment evasion |
|
||||
|
||||
---
|
||||
|
||||
### Task 1: `extractKeywords(promptText)` (pure)
|
||||
|
||||
**Files:** Modify `tools/enforce-safe-baseline-metering.mjs`; Test `tools/enforce-safe-baseline-metering.test.mjs`
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
```js
|
||||
import { extractKeywords } from './enforce-safe-baseline-metering.mjs';
|
||||
|
||||
describe('extractKeywords', () => {
|
||||
it('lowercases, drops <4-char tokens and stopwords, returns unique sorted', () => {
|
||||
expect(extractKeywords('Почини safe-baseline router gate')).toEqual(['baseline', 'gate', 'router', 'safe']);
|
||||
});
|
||||
it('drops common RU imperatives so unrelated tasks do not falsely overlap', () => {
|
||||
const a = extractKeywords('сделай проверь биллинг тариф');
|
||||
const b = extractKeywords('сделай проверь регион маршрут');
|
||||
const overlap = a.filter((k) => b.includes(k));
|
||||
expect(overlap).toEqual([]); // only the topic words survive, no shared imperatives
|
||||
});
|
||||
it('returns [] for empty/non-string', () => {
|
||||
expect(extractKeywords('')).toEqual([]);
|
||||
expect(extractKeywords(null)).toEqual([]);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails** — `npx vitest run tools/enforce-safe-baseline-metering.test.mjs` → FAIL (extractKeywords not exported).
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
```js
|
||||
const STOPWORDS = new Set([
|
||||
// RU common + imperatives
|
||||
'сделай', 'сделать', 'проверь', 'проверить', 'посмотри', 'добавь', 'добавить',
|
||||
'напиши', 'написать', 'нужно', 'надо', 'давай', 'можешь', 'потом', 'после',
|
||||
'перед', 'через', 'очень', 'если', 'чтобы', 'этот', 'эта', 'это', 'эти',
|
||||
'или', 'тоже', 'также', 'когда', 'пока', 'весь', 'всё', 'все', 'теперь',
|
||||
'здесь', 'там', 'нет', 'есть', 'будет', 'было', 'твой', 'мой', 'самый',
|
||||
// EN common + imperatives
|
||||
'then', 'this', 'that', 'with', 'from', 'your', 'please', 'just', 'make',
|
||||
'check', 'look', 'need', 'want', 'also', 'into', 'more', 'very', 'should',
|
||||
'will', 'have', 'does', 'done', 'them', 'they', 'here', 'there',
|
||||
]);
|
||||
|
||||
export function extractKeywords(promptText) {
|
||||
if (typeof promptText !== 'string') return [];
|
||||
const tokens = promptText
|
||||
.toLowerCase()
|
||||
.split(/[^\p{L}\p{N}]+/u)
|
||||
.filter((t) => t.length >= 4 && !STOPWORDS.has(t));
|
||||
return [...new Set(tokens)].sort();
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes** — expected PASS.
|
||||
|
||||
- [ ] **Step 5: Commit** — `git add tools/enforce-safe-baseline-metering.mjs tools/enforce-safe-baseline-metering.test.mjs` / `git commit -m "feat(safe-baseline): extractKeywords pure tokenizer (H1)"` *(defer overnight)*
|
||||
|
||||
---
|
||||
|
||||
### Task 2: `detectSkillMatch(turnEntries)` (pure)
|
||||
|
||||
**Files:** Modify both as above.
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
```js
|
||||
import { detectSkillMatch } from './enforce-safe-baseline-metering.mjs';
|
||||
|
||||
function asstToolUse(name, input = {}) {
|
||||
return { message: { role: 'assistant', content: [{ type: 'tool_use', name, input }] } };
|
||||
}
|
||||
|
||||
describe('detectSkillMatch', () => {
|
||||
it('true when the turn has a Skill tool_use', () => {
|
||||
expect(detectSkillMatch([asstToolUse('Skill', { skill: 'superpowers:brainstorming' })])).toBe(true);
|
||||
});
|
||||
it('true when the turn has an EnterPlanMode tool_use', () => {
|
||||
expect(detectSkillMatch([asstToolUse('EnterPlanMode')])).toBe(true);
|
||||
});
|
||||
it('false for Read/Grep/text-only turns (no self-grant via text)', () => {
|
||||
expect(detectSkillMatch([asstToolUse('Read', { file_path: 'docs/superpowers/plans/x.md' })])).toBe(false);
|
||||
expect(detectSkillMatch([{ message: { role: 'assistant', content: [{ type: 'text', text: 'docs/superpowers/plans/x.md' }] } }])).toBe(false);
|
||||
});
|
||||
it('false for empty/non-array', () => {
|
||||
expect(detectSkillMatch([])).toBe(false);
|
||||
expect(detectSkillMatch(null)).toBe(false);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify FAIL** (detectSkillMatch not exported).
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
```js
|
||||
const SKILL_MATCH_TOOLS = new Set(['Skill', 'EnterPlanMode']);
|
||||
|
||||
export function detectSkillMatch(turnEntries) {
|
||||
if (!Array.isArray(turnEntries)) return false;
|
||||
for (const e of turnEntries) {
|
||||
const c = e && e.message && e.message.content;
|
||||
if (!Array.isArray(c)) continue;
|
||||
for (const b of c) {
|
||||
if (b && b.type === 'tool_use' && SKILL_MATCH_TOOLS.has(b.name)) return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run to verify PASS.**
|
||||
|
||||
- [ ] **Step 5: Commit** *(defer overnight)*.
|
||||
|
||||
---
|
||||
|
||||
### Task 3: `runLiveDecision(...)` (pure — V2-1 stickiness contract)
|
||||
|
||||
**Files:** Modify both as above.
|
||||
|
||||
- [ ] **Step 1: Write the failing test** — cover BOTH V2-1 failure modes.
|
||||
|
||||
```js
|
||||
import { runLiveDecision } from './enforce-safe-baseline-metering.mjs';
|
||||
import { newCounterState } from './safe-baseline-metering.mjs';
|
||||
|
||||
function ledgerWith(counts, skill, keywords) {
|
||||
return {
|
||||
state: { ...newCounterState({ taskId: 't', startedAtIso: '2026-05-30T00:00:00Z', firstPromptExcerpt: 'p' }),
|
||||
counts: { Read: 0, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0, ...counts },
|
||||
skill_match_within_task: skill },
|
||||
lastKeywords: keywords,
|
||||
};
|
||||
}
|
||||
|
||||
describe('runLiveDecision — stickiness contract (V2-1)', () => {
|
||||
it('persists skillMatchedThisTurn into the ledger (stickiness not lost)', () => {
|
||||
const r = runLiveDecision({
|
||||
event: { tool_name: 'Read' }, priorLedger: null,
|
||||
promptText: 'router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
|
||||
skillMatchedThisTurn: true,
|
||||
});
|
||||
expect(r.ledger.state.skill_match_within_task).toBe(true);
|
||||
});
|
||||
|
||||
it('a skill earlier in a task keeps later mutating ops allowed past the hard limit (no false block)', () => {
|
||||
const prior = ledgerWith({ Read: 60 }, true, ['router', 'gate', 'safe', 'baseline']);
|
||||
const r = runLiveDecision({
|
||||
event: { tool_name: 'Edit' }, priorLedger: prior,
|
||||
promptText: 'продолжаем router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
|
||||
skillMatchedThisTurn: false,
|
||||
});
|
||||
expect(r.action).toBe('allow');
|
||||
});
|
||||
|
||||
it('skill match in task A does NOT exempt an unrelated task B (no cross-task leak)', () => {
|
||||
const prior = ledgerWith({ Read: 60 }, true, ['router', 'gate', 'safe', 'baseline']);
|
||||
const r = runLiveDecision({
|
||||
event: { tool_name: 'Edit' }, priorLedger: prior,
|
||||
promptText: 'другая тема регион маршрут лиды', currentKeywords: ['регион', 'маршрут', 'лиды'],
|
||||
skillMatchedThisTurn: false,
|
||||
});
|
||||
// fresh task (overlap < 2) → counters reset to 0 → Edit allowed BUT skill_match must be false now
|
||||
expect(r.ledger.state.skill_match_within_task).toBe(false);
|
||||
expect(r.ledger.state.counts.Read).toBe(0);
|
||||
});
|
||||
|
||||
it('hard-blocks a mutating tool past the limit in a no-skill task', () => {
|
||||
const prior = ledgerWith({ Read: 60 }, false, ['router', 'gate', 'safe', 'baseline']);
|
||||
const r = runLiveDecision({
|
||||
event: { tool_name: 'Edit' }, priorLedger: prior,
|
||||
promptText: 'router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
|
||||
skillMatchedThisTurn: false,
|
||||
});
|
||||
expect(r.action).toBe('hard_block');
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify FAIL.**
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
```js
|
||||
import { shouldInheritTaskId } from './safe-baseline-metering.mjs';
|
||||
|
||||
export function runLiveDecision({ event, priorLedger, promptText, currentKeywords, skillMatchedThisTurn, thresholds }) {
|
||||
const inherit = !!(priorLedger && priorLedger.state &&
|
||||
shouldInheritTaskId(priorLedger.lastKeywords || [], currentKeywords, promptText));
|
||||
const priorSticky = inherit ? !!priorLedger.state.skill_match_within_task : false;
|
||||
const effectiveSkillMatched = priorSticky || !!skillMatchedThisTurn;
|
||||
|
||||
const res = processEvent({
|
||||
event, priorLedger, currentKeywords, promptText,
|
||||
skillMatched: effectiveSkillMatched, thresholds,
|
||||
});
|
||||
// V2-1: persist stickiness — processEvent does not.
|
||||
res.ledger.state.skill_match_within_task = effectiveSkillMatched;
|
||||
return res;
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run to verify PASS.**
|
||||
|
||||
- [ ] **Step 5: Commit** *(defer overnight)*.
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Live `main()` wiring + integration test
|
||||
|
||||
**Files:** Modify both as above.
|
||||
|
||||
- [ ] **Step 1: Write the failing integration test** (injected runtimeDir + transcript fixture)
|
||||
|
||||
```js
|
||||
import { runMain } from './enforce-safe-baseline-metering.mjs';
|
||||
import { mkdtempSync, writeFileSync, readFileSync, existsSync } from 'node:fs';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
|
||||
function fixtureTranscript(path, entries) { writeFileSync(path, entries.map((e) => JSON.stringify(e)).join('\n')); }
|
||||
|
||||
describe('safe-baseline live main (runMain)', () => {
|
||||
it('blocks an Edit when Read past hard with no skill, and the message names the escape', async () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), 'sbm-'));
|
||||
const tpath = join(dir, 't.jsonl');
|
||||
// prior ledger: Read=60, no skill, same task keywords
|
||||
writeFileSync(join(dir, 'safe-baseline-ledger-S.json'), JSON.stringify({
|
||||
state: { schema_version: 1, task_id: 't', counts: { Read: 60, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 }, skill_match_within_task: false },
|
||||
lastKeywords: ['router', 'gate', 'safe', 'baseline'],
|
||||
}));
|
||||
fixtureTranscript(tpath, [{ type: 'user', message: { role: 'user', content: 'router gate safe baseline' } }]);
|
||||
const res = await runMain({
|
||||
event: { tool_name: 'Edit', session_id: 'S', transcript_path: tpath },
|
||||
runtimeDir: dir,
|
||||
});
|
||||
expect(res.block).toBe(true);
|
||||
expect(res.message).toMatch(/EnterPlanMode|Skill/);
|
||||
});
|
||||
|
||||
it('allows a fresh task and persists the ledger', async () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), 'sbm-'));
|
||||
const tpath = join(dir, 't.jsonl');
|
||||
fixtureTranscript(tpath, [{ type: 'user', message: { role: 'user', content: 'новая тема регион' } }]);
|
||||
const res = await runMain({
|
||||
event: { tool_name: 'Read', session_id: 'S2', transcript_path: tpath },
|
||||
runtimeDir: dir,
|
||||
});
|
||||
expect(res.block).toBe(false);
|
||||
expect(existsSync(join(dir, 'safe-baseline-ledger-S2.json'))).toBe(true);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify FAIL** (runMain not exported).
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation** — replace the no-op `main()` with a testable `runMain` + thin `main()`.
|
||||
|
||||
```js
|
||||
import { readFileSync as _rf, writeFileSync as _wf, appendFileSync as _af, mkdirSync as _mk } from 'node:fs';
|
||||
import { join as _join } from 'node:path';
|
||||
import { homedir as _home } from 'node:os';
|
||||
import { readStdin, parseEventJson, readTranscript, lastUserPromptText, lastTurnEntries, exitDecision } from './enforce-hook-helpers.mjs';
|
||||
|
||||
const ESCAPE_MSG = 'invoke the recommended Skill, or EnterPlanMode, to proceed (skill/plan invocations are never blocked by this layer).';
|
||||
|
||||
function rtDir(o) { return o || _join(_home(), '.claude', 'runtime'); }
|
||||
function loadLedger(dir, sess) {
|
||||
try { return JSON.parse(_rf(_join(dir, `safe-baseline-ledger-${sess || 'unknown'}.json`), 'utf8')); }
|
||||
catch { return null; }
|
||||
}
|
||||
function saveLedger(dir, sess, ledger) {
|
||||
try { _mk(dir, { recursive: true }); _wf(_join(dir, `safe-baseline-ledger-${sess || 'unknown'}.json`), JSON.stringify(ledger)); }
|
||||
catch { /* fail-quiet */ }
|
||||
}
|
||||
function logFlag(dir, sess, entry) {
|
||||
try { _mk(dir, { recursive: true }); _af(_join(dir, `safe-baseline-flags-${sess || 'unknown'}.jsonl`), JSON.stringify({ ts: new Date().toISOString(), ...entry }) + '\n'); }
|
||||
catch { /* ignore */ }
|
||||
}
|
||||
|
||||
export async function runMain({ event, runtimeDir, transcript: injectedTranscript } = {}) {
|
||||
try {
|
||||
const sess = event.session_id;
|
||||
const dir = rtDir(runtimeDir);
|
||||
const transcript = injectedTranscript || readTranscript(event.transcript_path);
|
||||
const promptText = lastUserPromptText(transcript) || '';
|
||||
const currentKeywords = extractKeywords(promptText);
|
||||
const skillMatchedThisTurn = detectSkillMatch(lastTurnEntries(transcript)) ||
|
||||
['Skill', 'EnterPlanMode'].includes(event.tool_name);
|
||||
const priorLedger = loadLedger(dir, sess);
|
||||
|
||||
const res = runLiveDecision({ event, priorLedger, promptText, currentKeywords, skillMatchedThisTurn });
|
||||
saveLedger(dir, sess, res.ledger);
|
||||
|
||||
if (res.action === 'soft_flag') logFlag(dir, sess, { tool: event.tool_name, reason: res.reason });
|
||||
if (res.action === 'hard_block') return { block: true, message: `[safe-baseline] ${res.reason}\n${ESCAPE_MSG}` };
|
||||
return { block: false };
|
||||
} catch {
|
||||
return { block: false }; // fail-quiet
|
||||
}
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const event = parseEventJson(await readStdin());
|
||||
const res = await runMain({ event });
|
||||
exitDecision(res);
|
||||
}
|
||||
|
||||
if ((process.argv[1] || '').replace(/\\/g, '/').endsWith('/enforce-safe-baseline-metering.mjs')) {
|
||||
main().catch(() => process.exit(0));
|
||||
}
|
||||
```
|
||||
|
||||
(Remove the old no-op `main()` and its CLI guard.)
|
||||
|
||||
- [ ] **Step 4: Run to verify PASS** + `npm run test:tools` GREEN.
|
||||
|
||||
- [ ] **Step 5: Commit** *(defer overnight)*.
|
||||
|
||||
---
|
||||
|
||||
### Task 5: `enforce-runtime-write-deny.mjs` (standalone, V2-2)
|
||||
|
||||
**Files:** Create `tools/enforce-runtime-write-deny.mjs` + `tools/enforce-runtime-write-deny.test.mjs`.
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
```js
|
||||
import { decide } from './enforce-runtime-write-deny.mjs';
|
||||
import { homedir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
|
||||
const HOME = homedir();
|
||||
|
||||
describe('enforce-runtime-write-deny decide()', () => {
|
||||
it('blocks a Write into ~/.claude/runtime', () => {
|
||||
const r = decide({ toolName: 'Write', filePath: join(HOME, '.claude', 'runtime', 'askuser-decisions-S.jsonl') });
|
||||
expect(r.block).toBe(true);
|
||||
});
|
||||
it('blocks the .-segment evasion (V2-2)', () => {
|
||||
const r = decide({ toolName: 'Write', filePath: join(HOME, '.claude', '.', 'runtime', 'x.jsonl') });
|
||||
expect(r.block).toBe(true);
|
||||
});
|
||||
it('allows a Write to a normal project path', () => {
|
||||
const r = decide({ toolName: 'Write', filePath: join(HOME, 'project', 'src', 'x.mjs') });
|
||||
expect(r.block).toBe(false);
|
||||
});
|
||||
it('ignores non-write tools', () => {
|
||||
expect(decide({ toolName: 'Read', filePath: join(HOME, '.claude', 'runtime', 'x') }).block).toBe(false);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify FAIL.**
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
```js
|
||||
#!/usr/bin/env node
|
||||
/**
|
||||
* enforce-runtime-write-deny — PreToolUse(Edit|Write|MultiEdit|NotebookEdit).
|
||||
* Blocks the Write/Edit TOOL from writing under ~/.claude/runtime/** (closes a
|
||||
* pre-existing self-write hole on the v4 git-approval anchor). Standalone —
|
||||
* independent of safe-baseline. Uses the resolving pathNormalize (V2-2) so
|
||||
* `.`/`..` segments cannot evade the match. Fail-OPEN on inability to determine
|
||||
* the path (never bricks the session); blocks only on a confirmed runtime match.
|
||||
*/
|
||||
import { pathNormalize } from './path-normalization.mjs';
|
||||
import { readStdin, parseEventJson, exitDecision } from './enforce-hook-helpers.mjs';
|
||||
|
||||
const WRITE_TOOLS = new Set(['Edit', 'Write', 'MultiEdit', 'NotebookEdit']);
|
||||
const RUNTIME_RE = /(^|\/)\.claude\/runtime(\/|$)/i;
|
||||
|
||||
export function decide({ toolName, filePath, normalizeImpl = pathNormalize }) {
|
||||
if (!WRITE_TOOLS.has(toolName)) return { block: false };
|
||||
const fp = String(filePath || '');
|
||||
if (!fp) return { block: false };
|
||||
let norm;
|
||||
try { norm = normalizeImpl(fp); } catch { return { block: false }; } // can't determine → fail-open (no brick)
|
||||
if (RUNTIME_RE.test(norm)) {
|
||||
return { block: true, reason: `Write to «${norm}» denied — ~/.claude/runtime is a protected side-channel (git-approval anchor).` };
|
||||
}
|
||||
return { block: false };
|
||||
}
|
||||
|
||||
async function main() {
|
||||
try {
|
||||
const event = parseEventJson(await readStdin());
|
||||
const r = decide({
|
||||
toolName: event.tool_name,
|
||||
filePath: (event.tool_input && (event.tool_input.file_path || event.tool_input.notebook_path)) || '',
|
||||
});
|
||||
exitDecision({ block: r.block, message: r.reason });
|
||||
} catch {
|
||||
exitDecision({ block: false }); // fail-quiet
|
||||
}
|
||||
}
|
||||
|
||||
const isCli = process.argv[1] && process.argv[1].replace(/\\/g, '/').endsWith('/enforce-runtime-write-deny.mjs');
|
||||
if (isCli) main();
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run to verify PASS** + `npm run test:tools` GREEN.
|
||||
|
||||
- [ ] **Step 5: Commit** *(defer overnight)*.
|
||||
|
||||
---
|
||||
|
||||
### Task 6: Full regression + handoff
|
||||
|
||||
- [ ] **Step 1:** `npm run test:tools` — confirm full GREEN count (baseline 1859 + new tests).
|
||||
- [ ] **Step 2:** Write the morning handoff note (`docs/observer/notes/2026-05-30-safe-baseline-overnight.md`): queued commits, exact `.claude/settings.json` registration block, the fail-OPEN deviation note for owner review, and the "flip to enforce" status (already enforce per owner; observe-mode was not requested).
|
||||
- [ ] **Step 3:** Commit everything in a batch with owner approval *(morning)*.
|
||||
|
||||
---
|
||||
|
||||
## Registration block (owner-applied, morning)
|
||||
|
||||
Add to `.claude/settings.json` `hooks.PreToolUse` (Claude cannot edit settings.json — gate-blocked):
|
||||
|
||||
```json
|
||||
{ "matcher": "Read|Grep|Glob|LS|TodoWrite|AskUserQuestion|Edit|Write|MultiEdit|NotebookEdit|Bash|Skill|Task|EnterPlanMode",
|
||||
"hooks": [{ "type": "command", "command": "node tools/enforce-safe-baseline-metering.mjs", "timeout": 10 }] }
|
||||
```
|
||||
|
||||
```json
|
||||
{ "matcher": "Edit|Write|MultiEdit|NotebookEdit",
|
||||
"hooks": [{ "type": "command", "command": "node tools/enforce-runtime-write-deny.mjs", "timeout": 5 }] }
|
||||
```
|
||||
|
||||
Until registered, both hooks are inert (no behavior change).
|
||||
@@ -0,0 +1,98 @@
|
||||
# Calibration 5 — cosmetic-detector excludes git-approval AskUser Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:test-driven-development. Steps use checkbox (`- [ ]`) syntax.
|
||||
|
||||
**Goal:** Stop `askuser-cosmetic-detector` from counting/blocking git-operation **approval** AskUsers as "cosmetic A/B" — a scope fix that does NOT lower discipline.
|
||||
|
||||
**Architecture:** The detector's target is *simple A/B questions used instead of structured ideation* (brainstorming/writing-plans). A git-approval AskUser (one whose option label is a literal git command) is the *sanctioned git-approval channel* (`enforce-askuser-answer-parser` writes `approve_git_operation` from it) — it is never a substitute for ideation. We add `isGitApprovalQuestion()` and short-circuit `decide()` to `allow` with `isSimpleAB:false` for such questions, so they are neither counted toward the session limit nor hard-blocked. Everything else is unchanged.
|
||||
|
||||
**Tech Stack:** Node ESM `.mjs`, vitest.
|
||||
|
||||
**Why NO discipline hole (adversarial check):**
|
||||
|
||||
- The real target (design-clarification A/B like "Вариант A"/"Вариант B") has NON-git labels → still classified simple → still counted → still hard-blocked at >2. Unchanged.
|
||||
- A git-approval question is identified ONLY by an option label matching a git-command verb. To "disguise" a cosmetic clarification as exempt, the controller would have to put a literal `git …` command as an option label — but then the chosen answer IS a git command, which `enforce-askuser-answer-parser` turns into a real `approve_git_operation` record; it cannot function as a cosmetic ideation-dodge. So there is no usable bypass.
|
||||
- Exemption is narrow and structural (label is a git command), mirroring calibrations 1 (Skill) / 3 (test-runner) / 4 (user-prompt fallback): scope fix, not a discipline drop.
|
||||
|
||||
---
|
||||
|
||||
## Task 1: isGitApprovalQuestion + decide() exemption
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `tools/askuser-cosmetic-detector.mjs`
|
||||
- Test: `tools/askuser-cosmetic-detector.test.mjs`
|
||||
|
||||
- [ ] **Step 1: Write failing tests**
|
||||
|
||||
```javascript
|
||||
import { isGitApprovalQuestion } from './askuser-cosmetic-detector.mjs';
|
||||
|
||||
describe('isGitApprovalQuestion (calibration 5)', () => {
|
||||
it('true when an option label is a git command', () => {
|
||||
expect(isGitApprovalQuestion([{ options: [{ label: 'git push origin main' }, { label: 'Не пушить' }] }])).toBe(true);
|
||||
expect(isGitApprovalQuestion([{ options: [{ label: 'git commit -F x -- a b' }, { label: 'Отмена' }] }])).toBe(true);
|
||||
});
|
||||
it('false for a non-git A/B', () => {
|
||||
expect(isGitApprovalQuestion([{ options: [{ label: 'Вариант А' }, { label: 'Вариант Б' }] }])).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
// decide(): git-approval question is exempt — allow, not simple, not counted, never blocked even past the session limit.
|
||||
describe('decide — git-approval exemption (calibration 5)', () => {
|
||||
it('allows a git-approval question and does NOT count it even when session is already over the limit', () => {
|
||||
const r = decide({
|
||||
questions: [{ options: [{ label: 'git push origin main' }, { label: 'Не пушить' }] }],
|
||||
simpleCountSession: 5, brainstormingInvoked: false,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.action).toBe('allow');
|
||||
expect(r.isSimpleAB).toBe(false);
|
||||
expect(r.newSessionCount).toBe(5); // unchanged — not counted
|
||||
});
|
||||
|
||||
it('REGRESSION: a non-git simple A/B past the limit STILL hard-blocks (discipline intact)', () => {
|
||||
const r = decide({
|
||||
questions: [{ options: [{ label: 'A' }, { label: 'B' }] }],
|
||||
simpleCountSession: 5, brainstormingInvoked: false,
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.action).toBe('hard_block');
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run RED** — `npx vitest run --root app --config vitest.config.tools.mjs askuser-cosmetic-detector` → fail (isGitApprovalQuestion missing; git-approval not exempt).
|
||||
|
||||
- [ ] **Step 3: Implement**
|
||||
|
||||
Add near `isSimpleAB`:
|
||||
|
||||
```javascript
|
||||
const GIT_CMD_RE = /\bgit\s+(?:commit|push|add|pull|merge|rebase|reset|checkout|switch|branch|stash|cherry-pick|revert|clean|restore|fetch|tag)\b/i;
|
||||
|
||||
/** True if this AskUser is a git-operation approval prompt (an option label is a git command). */
|
||||
export function isGitApprovalQuestion(questions) {
|
||||
if (!Array.isArray(questions)) return false;
|
||||
return questions.some((q) =>
|
||||
q && Array.isArray(q.options) &&
|
||||
q.options.some((o) => o && typeof o.label === 'string' && GIT_CMD_RE.test(o.label)));
|
||||
}
|
||||
```
|
||||
|
||||
In `decide()`, replace `const simple = isSimpleAB(questions);` with:
|
||||
|
||||
```javascript
|
||||
// Calibration 5: git-operation approval prompts are the sanctioned approval
|
||||
// channel, never cosmetic ideation — exempt from the simple-AB count/block.
|
||||
if (isGitApprovalQuestion(questions)) {
|
||||
return { action: 'allow', block: false, reason: null, isSimpleAB: false, newSessionCount: simpleCountSession, newTurnCount: simpleCountTurn };
|
||||
}
|
||||
const simple = isSimpleAB(questions);
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run GREEN** — same command → pass.
|
||||
|
||||
- [ ] **Step 5: Full regression** — `npx vitest run --root app --config vitest.config.tools.mjs` → all green.
|
||||
|
||||
- [ ] **Step 6: Commit** (with git-approval).
|
||||
@@ -0,0 +1,144 @@
|
||||
# Discipline-guard backlog — router-gate `tools/enforce-*.mjs`
|
||||
|
||||
**Worktree:** `.claude/worktrees/discipline-guard` (branch `worktree-discipline-guard`).
|
||||
**Date:** 2026-05-31. Owner-authorized backlog after quirk-2 + 1A closure (commit `b0cd18d7`).
|
||||
|
||||
## Context (already done — do NOT redo)
|
||||
|
||||
- **Quirk 2** — redirect detector is quote-aware (`stripQuotedSpans` in `tools/enforce-router-gate.mjs`): `>`/`2>` inside quotes no longer false-blocks. Commit `b0cd18d7`.
|
||||
- **1A** — removed advertising of dead override phrases (`findOverride` is a v4 stub) from `enforce-prompt-injection` + verify-before-push / coverage-verify / memory-coverage / tdd-gate. Locked by negative tests. Same commit.
|
||||
- Marketing MCP servers cut from `.mcp.json` (commit `63100dec`).
|
||||
|
||||
## Deliberately NOT doing (these are defense lines, not bugs)
|
||||
|
||||
- Calibration 6 of the judge (reading chat context) — weakens in-session defense.
|
||||
- Quirk 3 (loosen exact-match of git approval) — that exact-match is an anti-injection property.
|
||||
|
||||
## Backlog (by priority)
|
||||
|
||||
### A. `npm ci` in router-gate whitelist (`SAFE_EXACT` in `tools/enforce-router-gate.mjs`) ← current
|
||||
|
||||
Restoring locked dependencies is safe and closes worktree-setup friction. `npm ci` installs
|
||||
exactly the committed lockfile (deterministic, no version drift) — unlike `npm install`/`npm i`,
|
||||
which stay hard-blacklisted because they can pull new/updated versions.
|
||||
|
||||
**TDD:**
|
||||
1. RED — new describe block in `tools/enforce-router-gate.test.mjs`: allow `npm ci`,
|
||||
`npm ci --no-audit`, `npm ci --prefer-offline`; still block `npm install`/`npm i`/
|
||||
`npm install foo`/`npm i foo` (hard-blacklist), `npm cider` (word boundary → default-deny),
|
||||
`npm ci && rm x` (chain mutating).
|
||||
2. GREEN — add `/^npm\s+ci\b/` to `SAFE_EXACT` with rationale comment. `\b` prevents
|
||||
`npm cider`-style prefix matches. Blacklist runs before whitelist, so `npm install`/`npm i`
|
||||
stay blocked (the `i`-alternative needs `i` right after the space; `npm ci` has `c` there).
|
||||
3. tools-vitest full run (also the push sentinel).
|
||||
4. Commit via AskUserQuestion (label = exact command).
|
||||
|
||||
### B. Cosmetic path strings in gate messages
|
||||
|
||||
`c:/` vs `/c/`, unexpanded `$env:` in gate messages. Polish only.
|
||||
|
||||
### F. Parallel-session-lock false cross-worktree collision (2026-05-31, owner-raised)
|
||||
|
||||
Symptom: a session in worktree `discipline-guard` was blocked by
|
||||
`enforce-parallel-session-lock` (held by another session `7f6efd48`, pid changed
|
||||
12552→19044 across attempts → holder still active; pid is the transient hook-node pid,
|
||||
session_id is the stable identity).
|
||||
|
||||
**Investigation (read-only):**
|
||||
- Lock keyed by `computeWorkspaceHash(process.cwd())` = md5(cwd).slice(0,12); file
|
||||
`~/.claude/runtime/session-lock-<hash>.json`; release only on Stop; TTL 5 min.
|
||||
- 9 lock files accumulated → stale files leak when a session closes without a clean Stop.
|
||||
- `enforce-branch-switch` read branch "worktree-discipline-guard" via
|
||||
`git branch --show-current` from `process.cwd()` → the hook's cwd IS the worktree →
|
||||
**keying is already per-worktree** (NOT coarse main-dir). So the holder shared this
|
||||
worktree's hash → genuine same-worktree concurrency, the lock working as designed —
|
||||
NOT a false positive. Do NOT re-key (would weaken same-tree serialization).
|
||||
|
||||
**Genuinely-fixable part (no weakening):** leaked lock on close-without-Stop blocks the next
|
||||
same-worktree session for up to TTL. Fix: release on SessionEnd (not only Stop) + prune
|
||||
stale lock files on acquire. Ground-truth the lock JSON before coding.
|
||||
|
||||
**Closure (2026-05-31).** All keying/hygiene/UX parts done, no discipline weakened:
|
||||
- **A — keying by worktree root** (`resolveWorkspacePath`, commit `7a469dc9`): keys the
|
||||
lock on the session's stable `event.cwd` → git toplevel, not the volatile hook
|
||||
`process.cwd()` (which collapses to main on resume → cross-worktree false-blocks).
|
||||
Same-worktree serialization unchanged; fallback to `process.cwd()` if `event.cwd` absent.
|
||||
- **D — clearer block message**: identifies the holder by its STABLE `session_id`; marks
|
||||
the recorded pid as transient ("may change between attempts"). Chasing the pid was what
|
||||
led to closing the wrong session. Logic untouched (text only).
|
||||
- **B — `pruneStaleLocks`**: best-effort delete of leaked lock files that are ALREADY
|
||||
stale by the shared `isStale()` (now exported — single source of truth). Active
|
||||
within-TTL locks are never touched → serialization not weakened. Wired into the
|
||||
PreToolUse branch of `main()`, wrapped so hygiene can never break the gate.
|
||||
- **C — release on SessionEnd**: NO new code. The existing `!event.tool_name` branch
|
||||
already releases. To make release fire on session end (not only on Stop turns),
|
||||
**OWNER ACTION in `.claude/settings.json`**: add `enforce-parallel-session-lock.mjs`
|
||||
to the `SessionEnd` hook array (it already runs on `Stop`). Pure config; Claude cannot
|
||||
edit settings.json. Until added, leaked locks are still self-healing via B (prune) +
|
||||
the 5-min TTL takeover — so this is a reliability nicety, not a correctness gap.
|
||||
- **E/F — live**: fix is on branch `worktree-discipline-guard`; the live hook executes
|
||||
from `tools/` on **main**, so it is active only after merge to main. Runtime
|
||||
effectiveness of A depends on the PreToolUse payload carrying `cwd`; if absent, the
|
||||
safe fallback = prior behavior (no regression). Verify on main.
|
||||
|
||||
### C. TDD-gate cross-actor — chosen: **Z** (full, 2026-05-31; on hold behind F)
|
||||
|
||||
`enforce-tdd-gate` does not see test edits made by a subagent (scans only the controller's
|
||||
own turn; subagent test edit + RED live in `agent-<id>.jsonl`). **Z = Part 1 (close the
|
||||
projects/ Write hole — verified prerequisite) then Part 2 (read subagent transcript bound to
|
||||
a Task in this turn).** Condition 1 verified VIOLATED (no Write-tool gate covers
|
||||
`~/.claude/projects/`), so Variant 1 alone would weaken — safe only bundled with Part 1.
|
||||
|
||||
**Closure (2026-05-31, TDD, no discipline weakened — net strengthening):**
|
||||
- **Part 1** — `enforce-runtime-write-deny.mjs` extended with `TRANSCRIPT_RE`
|
||||
(`(^|/)\.claude/projects/.*\.jsonl$`): the Write tool can no longer create/overwrite any
|
||||
session/subagent transcript `.jsonl`. Memory files there are `.md` and stay writable
|
||||
(never match `.jsonl$`). Resolving normalizer blocks `.`/`..` evasion. This makes the
|
||||
agent-`<id>`.jsonl that Part 2 trusts unforgeable.
|
||||
- **Part 2** — `enforce-tdd-gate.mjs`: `decide()` now also credits a subagent's matching
|
||||
test edit + RED run via new `subagentEntriesList`. `turnTaskAgentIds(turn)` extracts the
|
||||
**hex** agentId from the harness-written `Task` tool_result ("agentId: <hex>") — the
|
||||
controller cannot forge its own tool_result, and the hex-only match blocks
|
||||
`agentId: ../../x` path-traversal. `subagentTranscriptPaths()` derives
|
||||
`<dir>/<controller-session>/subagents/agent-<id>.jsonl` (bound to the controller session).
|
||||
`main()` reads those transcripts best-effort (missing → no extra credit = stricter, never
|
||||
an error). No NEW weakening: a delegated subagent doing real TDD is legitimate; the only
|
||||
forgery vector (overwrite the agent jsonl) is closed by Part 1.
|
||||
- Full tools-vitest: **2027 passed / 2 skipped**.
|
||||
- **OWNER ACTION (settings.json, Claude can't edit it):** `enforce-tdd-gate.mjs` is already
|
||||
a registered PreToolUse hook → Part 2 goes live on merge. **Part 1 requires that
|
||||
`enforce-runtime-write-deny.mjs` be registered** on PreToolUse(Edit|Write|MultiEdit|
|
||||
NotebookEdit); if it is not yet registered, the transcript Write-deny is inert until added.
|
||||
|
||||
### G. Coverage line under-reports cross-turn active skill (2026-05-31, owner-raised)
|
||||
|
||||
Symptom: the `coverage: <channel>:<id>` line says `direct`/`chain` when a skill chosen in a
|
||||
PRIOR turn is still active in the current turn. Root cause: `enforce-coverage-verify.mjs`
|
||||
credits `channel=skill` only if the `Skill` tool was invoked in the CURRENT turn
|
||||
(`turnToolUses`). On a continuation turn (skill still active, not re-invoked) an honest
|
||||
`skill:X` line would be BLOCKED → so the controller learns to under-report as `direct`/`chain`.
|
||||
|
||||
**Fix (no weakening):** also credit `skill:X` if X was invoked anywhere earlier in THIS
|
||||
session (a real `Skill` tool_use in the transcript — still unforgeable). decide() gains a
|
||||
`priorSkillNames` param; main() collects session-wide Skill names via `sessionToolUses`.
|
||||
Residual: attribution may be stale (skill invoked long ago) — acceptable; the alternative
|
||||
(forced dishonest `direct`) is worse, and the owner wants cross-turn skills honored.
|
||||
|
||||
### D. Smoke 8 — live Workflow-gate F2 test
|
||||
|
||||
Needs a clean session (not code).
|
||||
|
||||
### E. H10 — auto-bootstrap worktree (junction node_modules) in `tools/subagent-prompt-prefix.mjs`
|
||||
|
||||
### (later) Layer 5 — VM + YubiKey — needs hardware.
|
||||
|
||||
## Environment working rules
|
||||
|
||||
- Tests / push sentinel: `npx vitest run --root app --config vitest.config.tools.mjs`
|
||||
(NOT `npm run test:tools` — breaks on keytar). From inside the worktree it's run as
|
||||
`--root app`; from the main checkout, point `--root` at the worktree app dir.
|
||||
- Commit: only via AskUserQuestion where the option label = the EXACT command (router-gate
|
||||
compares verbatim) + plain-language explanation; commit text via `-F` file in `.scratch/`;
|
||||
commit only explicit paths (parallel sessions).
|
||||
- Push: needs a fresh verify-sentinel (full run ≤30 min); override phrases are dead
|
||||
(`findOverride` is a stub) → the only path to push non-`.md` changes is to run the tests.
|
||||
@@ -0,0 +1,409 @@
|
||||
# LLM-judge live wiring (item 2b) Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Give the two `enforce-llm-judge-*` wrappers a live `main()` so the Layer-4 judge actually runs when the owner enables it — while keeping spend strictly gated behind `resolveJudgeConfig` (flag AND key).
|
||||
|
||||
**Architecture:** The judge *engines* (`llm-judge-per-tool.mjs`, `llm-judge-response-scan.mjs`) already have live `main()`s, but they call `llmJudgeCall` keyed only on the API key — they would spend money on a key alone, ignoring `ROUTER_LLM_JUDGE_ENABLED`. That violates the safe-by-default contract in `llm-judge-config.mjs` (enabled ⇔ flag AND key). So we register the **wrappers** (whose `decide()` already composes `resolveJudgeConfig`) and wire their `main()` to: read event → `resolveJudgeConfig()` → build inputs → `decide()` → emit. When `enabled === false`, `decide()` short-circuits with no LLM call ($0). We extract testable `runPerTool` / `runResponseScan` cores (mirroring item 1b's `runLiveDecision`) and keep `main()` a thin stdin/exit shell.
|
||||
|
||||
**Tech Stack:** Node ESM, vitest (tools-only config `app/vitest.config.tools.mjs`, run from repo root as `npx vitest run --root app --config vitest.config.tools.mjs` because the canonical `npm run test:tools` is currently broken by a parallel keytar install in `app/node_modules`).
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
- Modify: `tools/enforce-llm-judge-per-tool.mjs` — add exported `runPerTool(...)` + wire live `main()`. Keep existing `decide()` untouched.
|
||||
- Modify: `tools/enforce-llm-judge-response-scan.mjs` — add exported `runResponseScan(...)` + wire live `main()`. Keep existing `decide()` untouched.
|
||||
- Test: `tools/enforce-llm-judge-per-tool.test.mjs` — add a `runPerTool` describe block.
|
||||
- Test: `tools/enforce-llm-judge-response-scan.test.mjs` — add a `runResponseScan` describe block.
|
||||
|
||||
**Safety invariant under test:** when `judgeConfig.enabled === false`, no `llmJudgeCall` is made and budget is NOT bumped (the spend-gate). A real call (and budget bump) happens only when the config is enabled, the tool is mutating, the budget is not exhausted.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: per-tool wrapper — `runPerTool` + live `main()`
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `tools/enforce-llm-judge-per-tool.mjs`
|
||||
- Test: `tools/enforce-llm-judge-per-tool.test.mjs`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Append to `tools/enforce-llm-judge-per-tool.test.mjs`:
|
||||
|
||||
```javascript
|
||||
import { runPerTool } from './enforce-llm-judge-per-tool.mjs';
|
||||
|
||||
describe('runPerTool — spend-gate + budget binding', () => {
|
||||
const deps = (over = {}) => ({
|
||||
readDeclaredTaskImpl: () => ({ task_summary: 't', recommended_node: null, recommended_chain: [] }),
|
||||
readBudgetImpl: () => 0,
|
||||
bumpBudgetImpl: () => {},
|
||||
sessionBudget: 200,
|
||||
...over,
|
||||
});
|
||||
|
||||
it('disabled config + mutating tool → degraded allow, NO budget bump, NO llm call', async () => {
|
||||
let bumped = 0; let called = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
|
||||
judgeConfig: { enabled: false, apiKey: null },
|
||||
llmJudgeCallImpl: () => { called++; return 'NO'; },
|
||||
...deps({ bumpBudgetImpl: () => { bumped++; } }),
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.degraded).toBe(true);
|
||||
expect(called).toBe(0);
|
||||
expect(bumped).toBe(0);
|
||||
});
|
||||
|
||||
it('enabled + mutating + judge YES → allow, budget bumped once', async () => {
|
||||
let bumped = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
llmJudgeCallImpl: async () => 'YES',
|
||||
...deps({ bumpBudgetImpl: () => { bumped++; } }),
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.verdict).toBe('YES');
|
||||
expect(bumped).toBe(1);
|
||||
});
|
||||
|
||||
it('enabled + mutating + judge NO → block, budget bumped once', async () => {
|
||||
let bumped = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Bash', tool_input: { command: 'x' }, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
llmJudgeCallImpl: async () => 'NO',
|
||||
...deps({ bumpBudgetImpl: () => { bumped++; } }),
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.verdict).toBe('NO');
|
||||
expect(bumped).toBe(1);
|
||||
});
|
||||
|
||||
it('non-mutating tool → allow, NO call, NO bump', async () => {
|
||||
let bumped = 0; let called = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Read', tool_input: {}, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
llmJudgeCallImpl: () => { called++; return 'NO'; },
|
||||
...deps({ bumpBudgetImpl: () => { bumped++; } }),
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(called).toBe(0);
|
||||
expect(bumped).toBe(0);
|
||||
});
|
||||
|
||||
it('enabled but budget exhausted → degraded allow, NO bump', async () => {
|
||||
let bumped = 0; let called = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
llmJudgeCallImpl: () => { called++; return 'NO'; },
|
||||
...deps({ readBudgetImpl: () => 200, bumpBudgetImpl: () => { bumped++; } }),
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.degraded).toBe(true);
|
||||
expect(called).toBe(0);
|
||||
expect(bumped).toBe(0);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `npx vitest run --root app --config vitest.config.tools.mjs tools/enforce-llm-judge-per-tool.test.mjs`
|
||||
Expected: FAIL — `runPerTool` is not exported.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
In `tools/enforce-llm-judge-per-tool.mjs`, replace the import line and the no-op `main()`:
|
||||
|
||||
```javascript
|
||||
import { judgePerTool, MUTATING_TOOLS, readDeclaredTask } from './llm-judge-per-tool.mjs';
|
||||
import { resolveJudgeConfig } from './llm-judge-config.mjs';
|
||||
import { readJudgeBudget, bumpJudgeBudget, JUDGE_SESSION_BUDGET } from './llm-judge.mjs';
|
||||
import { llmJudgeCall } from './llm-judge.mjs';
|
||||
import { readStdin, parseEventJson, exitDecision } from './enforce-hook-helpers.mjs';
|
||||
```
|
||||
|
||||
(Keep the existing `decide(...)` export exactly as is.)
|
||||
|
||||
Add the testable core (a real LLM call is signalled by `result.verdict !== undefined`; budget is bumped only then):
|
||||
|
||||
```javascript
|
||||
/**
|
||||
* Testable wiring core. Composes resolveJudgeConfig output + decide(); bumps the
|
||||
* session budget ONLY when a real judge call was made (result carries a verdict).
|
||||
* No verdict ⇒ non-mutating / disabled / no-key / budget-exhausted ⇒ no spend.
|
||||
*/
|
||||
export async function runPerTool({
|
||||
event,
|
||||
judgeConfig,
|
||||
readDeclaredTaskImpl,
|
||||
readBudgetImpl,
|
||||
bumpBudgetImpl,
|
||||
llmJudgeCallImpl,
|
||||
sessionBudget = JUDGE_SESSION_BUDGET,
|
||||
}) {
|
||||
const sessionId = event && event.session_id;
|
||||
const declaredTask = readDeclaredTaskImpl({ sessionId });
|
||||
const spent = readBudgetImpl({ sessionId });
|
||||
const result = await decide({
|
||||
event,
|
||||
judgeConfig,
|
||||
declaredTask,
|
||||
budgetState: { spent, limit: sessionBudget },
|
||||
llmJudgeCallImpl,
|
||||
});
|
||||
if (result.verdict !== undefined) bumpBudgetImpl({ sessionId, by: 1 });
|
||||
return result;
|
||||
}
|
||||
```
|
||||
|
||||
Replace the no-op `main()` with:
|
||||
|
||||
```javascript
|
||||
async function main() {
|
||||
try {
|
||||
const event = parseEventJson(await readStdin());
|
||||
const judgeConfig = resolveJudgeConfig();
|
||||
const result = await runPerTool({
|
||||
event,
|
||||
judgeConfig,
|
||||
readDeclaredTaskImpl: readDeclaredTask,
|
||||
readBudgetImpl: readJudgeBudget,
|
||||
bumpBudgetImpl: bumpJudgeBudget,
|
||||
llmJudgeCallImpl: (opts) => llmJudgeCall(opts),
|
||||
});
|
||||
exitDecision({ block: result.block, message: result.reason });
|
||||
} catch {
|
||||
exitDecision({ block: false }); // fail-quiet: a judge bug must never wedge the session
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `npx vitest run --root app --config vitest.config.tools.mjs tools/enforce-llm-judge-per-tool.test.mjs`
|
||||
Expected: PASS (existing `decide()` tests + 5 new `runPerTool` tests).
|
||||
|
||||
- [ ] **Step 5: Commit** (requires AskUserQuestion git approval + fresh full-suite sentinel)
|
||||
|
||||
```bash
|
||||
git commit tools/enforce-llm-judge-per-tool.mjs tools/enforce-llm-judge-per-tool.test.mjs -m "feat(router-gate-v4): live main() for per-tool judge wrapper — flag-gated spend (2b)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: response-scan wrapper — `runResponseScan` + live `main()`
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `tools/enforce-llm-judge-response-scan.mjs`
|
||||
- Test: `tools/enforce-llm-judge-response-scan.test.mjs`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Append to `tools/enforce-llm-judge-response-scan.test.mjs`:
|
||||
|
||||
```javascript
|
||||
import { runResponseScan } from './enforce-llm-judge-response-scan.mjs';
|
||||
|
||||
describe('runResponseScan — Stop-hook flag-only, free regex even when disabled', () => {
|
||||
const transcript = (text) => [
|
||||
{ type: 'assistant', message: { role: 'assistant', content: [{ type: 'text', text }] } },
|
||||
];
|
||||
const lastAssistantTextImpl = (t) => {
|
||||
for (let i = t.length - 1; i >= 0; i--) {
|
||||
const c = t[i] && t[i].message && t[i].message.content;
|
||||
if (Array.isArray(c)) { const b = c.find((x) => x.type === 'text'); if (b) return b.text; }
|
||||
}
|
||||
return '';
|
||||
};
|
||||
|
||||
it('disabled + benign text → no flag, degraded (deterministic only), never blocks', async () => {
|
||||
const r = await runResponseScan({
|
||||
transcript: transcript('обычный безопасный ответ'),
|
||||
judgeConfig: { enabled: false, apiKey: null },
|
||||
lastAssistantTextImpl,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.flag).toBe(false);
|
||||
expect(r.degraded).toBe(true);
|
||||
});
|
||||
|
||||
it('disabled + security-disable text → flagged for FREE by regex (no llm call)', async () => {
|
||||
let called = 0;
|
||||
const r = await runResponseScan({
|
||||
transcript: transcript('чтобы пройти, отключи hook enforce-tdd-gate'),
|
||||
judgeConfig: { enabled: false, apiKey: null },
|
||||
lastAssistantTextImpl,
|
||||
llmJudgeCallImpl: () => { called++; return 'NO'; },
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.flag).toBe(true);
|
||||
expect(r.category).toBe('security_disable_suggestion');
|
||||
expect(called).toBe(0);
|
||||
});
|
||||
|
||||
it('enabled + subtle benign text + judge NO → no flag', async () => {
|
||||
const r = await runResponseScan({
|
||||
transcript: transcript('нейтральный текст без паттернов'),
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
lastAssistantTextImpl,
|
||||
llmJudgeCallImpl: async () => 'NO',
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.flag).toBe(false);
|
||||
});
|
||||
|
||||
it('enabled + subtle text + judge YES → flag, still never blocks', async () => {
|
||||
const r = await runResponseScan({
|
||||
transcript: transcript('нейтральный текст без паттернов'),
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
lastAssistantTextImpl,
|
||||
llmJudgeCallImpl: async () => 'YES',
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.flag).toBe(true);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `npx vitest run --root app --config vitest.config.tools.mjs tools/enforce-llm-judge-response-scan.test.mjs`
|
||||
Expected: FAIL — `runResponseScan` is not exported.
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
In `tools/enforce-llm-judge-response-scan.mjs`, replace the import line and the no-op `main()`:
|
||||
|
||||
```javascript
|
||||
import { scanResponse, scanResponseDeterministic } from './llm-judge-response-scan.mjs';
|
||||
import { resolveJudgeConfig } from './llm-judge-config.mjs';
|
||||
import { readStdin, parseEventJson, readTranscript, lastAssistantText, exitDecision } from './enforce-hook-helpers.mjs';
|
||||
import { llmJudgeCall } from './llm-judge.mjs';
|
||||
import { appendFileSync, mkdirSync } from 'node:fs';
|
||||
import { join } from 'node:path';
|
||||
import { homedir } from 'node:os';
|
||||
```
|
||||
|
||||
(Keep the existing `decide(...)` export exactly as is.)
|
||||
|
||||
Add the testable core:
|
||||
|
||||
```javascript
|
||||
/**
|
||||
* Testable wiring core. Stop-hook semantics: block is always false. The free
|
||||
* deterministic regex scan runs even when the judge is disabled; the paid LLM
|
||||
* escalation runs only when judgeConfig.enabled.
|
||||
*/
|
||||
export async function runResponseScan({ transcript, judgeConfig, llmJudgeCallImpl, lastAssistantTextImpl = lastAssistantText }) {
|
||||
const responseText = lastAssistantTextImpl(transcript || []);
|
||||
const r = await decide({ responseText, judgeConfig, llmJudgeCallImpl });
|
||||
return { ...r, responseText };
|
||||
}
|
||||
```
|
||||
|
||||
Replace the no-op `main()` with:
|
||||
|
||||
```javascript
|
||||
function flagToFile({ sessionId, category, excerpt }) {
|
||||
try {
|
||||
const dir = join(homedir(), '.claude', 'runtime');
|
||||
mkdirSync(dir, { recursive: true });
|
||||
appendFileSync(join(dir, `rationalization-flags-${sessionId || 'unknown'}.jsonl`),
|
||||
JSON.stringify({
|
||||
ts: new Date().toISOString(),
|
||||
session_id: sessionId || null,
|
||||
type: 'controller_response_suspicious',
|
||||
category,
|
||||
response_excerpt: String(excerpt || '').slice(0, 200),
|
||||
}) + '\n');
|
||||
} catch { /* ignore */ }
|
||||
}
|
||||
|
||||
async function main() {
|
||||
try {
|
||||
const event = parseEventJson(await readStdin());
|
||||
const transcript = readTranscript(event.transcript_path);
|
||||
const judgeConfig = resolveJudgeConfig();
|
||||
const r = await runResponseScan({
|
||||
transcript,
|
||||
judgeConfig,
|
||||
llmJudgeCallImpl: (opts) => llmJudgeCall(opts),
|
||||
});
|
||||
if (r.flag) flagToFile({ sessionId: event.session_id, category: r.category, excerpt: r.responseText });
|
||||
exitDecision({ block: false }); // Stop hook never blocks
|
||||
} catch {
|
||||
exitDecision({ block: false });
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `npx vitest run --root app --config vitest.config.tools.mjs tools/enforce-llm-judge-response-scan.test.mjs`
|
||||
Expected: PASS (existing `decide()` tests + 4 new `runResponseScan` tests).
|
||||
|
||||
- [ ] **Step 5: Commit** (AskUserQuestion git approval + fresh sentinel)
|
||||
|
||||
```bash
|
||||
git commit tools/enforce-llm-judge-response-scan.mjs tools/enforce-llm-judge-response-scan.test.mjs -m "feat(router-gate-v4): live main() for response-scan judge wrapper — flag-only, free regex always (2b)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: full-suite regression + push
|
||||
|
||||
- [ ] **Step 1: Run the canonical tools suite**
|
||||
|
||||
Run: `npx vitest run --root app --config vitest.config.tools.mjs`
|
||||
Expected: PASS, 0 failed (≈1905 + 9 new = ~1914). This also writes the verify-before-push sentinel.
|
||||
|
||||
- [ ] **Step 2: Push** (AskUserQuestion git approval)
|
||||
|
||||
```bash
|
||||
git push origin main
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: owner registration instructions (NOT code — owner applies)
|
||||
|
||||
The wiring above is inert until the owner does all three (cost starts only after all three):
|
||||
|
||||
1. **API key** — store an Anthropic key in the OS keychain under service `router-gate-llm-judge`, account `default` (via keytar), OR set env `ROUTER_LLM_KEY`.
|
||||
2. **Flag** — set env `ROUTER_LLM_JUDGE_ENABLED=1`.
|
||||
3. **Register both wrappers in `.claude/settings.json`:**
|
||||
|
||||
- PreToolUse (can block):
|
||||
|
||||
```json
|
||||
{ "matcher": "Edit|Write|MultiEdit|NotebookEdit|Bash|PowerShell|Skill|Task",
|
||||
"hooks": [{ "type": "command", "command": "node tools/enforce-llm-judge-per-tool.mjs", "timeout": 30 }] }
|
||||
```
|
||||
|
||||
- Stop (flag-only):
|
||||
|
||||
```json
|
||||
{ "matcher": "*",
|
||||
"hooks": [{ "type": "command", "command": "node tools/enforce-llm-judge-response-scan.mjs", "timeout": 30 }] }
|
||||
```
|
||||
|
||||
Then fully restart Claude Code. Budget cap is `JUDGE_SESSION_BUDGET = 200` calls/session (in `llm-judge.mjs`). Per-call cost depends on model (`JUDGE_MODELS.single = claude-sonnet-4-6`).
|
||||
|
||||
**Why the wrappers, not the engines:** the engine `main()`s (`llm-judge-per-tool.mjs` / `llm-judge-response-scan.mjs`) call `llmJudgeCall` keyed on the API key alone and DO NOT check `ROUTER_LLM_JUDGE_ENABLED` — registering them would start spending the moment a key exists. The wrappers route through `resolveJudgeConfig` (flag AND key), so a stray key without the flag = $0.
|
||||
|
||||
---
|
||||
|
||||
## Self-Review
|
||||
|
||||
- **Spec coverage:** per-tool live wiring (Task 1), response-scan live wiring (Task 2), flag-gated spend safety invariant (tests in both), owner activation (Task 4). ✓
|
||||
- **Placeholder scan:** none — all code blocks are complete. ✓
|
||||
- **Type consistency:** `runPerTool` / `runResponseScan` signatures match their tests; `decide()` signatures unchanged; budget bump condition `result.verdict !== undefined` matches `judgePerTool` (sets `verdict` only after a real call). ✓
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,132 @@
|
||||
# Safe-baseline metering — live wiring (router-gate v4 §3.1.2, item 1b)
|
||||
|
||||
**Date:** 2026-05-30
|
||||
**Status:** Design v4 — ghost-risk pass folded in. Removed the user-approval override subsystem (G3 ghost-protection — almost never exercised); the escape is now solely "invoke any Skill/EnterPlanMode" (always available, C1). The runtime write-deny hook is retained but **decoupled** into a standalone git-approval-anchor hardening (independent value). Ready for writing-plans.
|
||||
**Scope:** Turn the deferred no-op `main()` of `tools/enforce-safe-baseline-metering.mjs` into a live PreToolUse hook that meters safe-baseline tool usage per task and hard-blocks a mutating tool once a counter passes its hard threshold without a real skill match. The false-block escape is invoking any Skill/EnterPlanMode (never blocked by this layer). A separate, decoupled hook closes a pre-existing self-write hole on `~/.claude/runtime` side-channels (git-approval anchor).
|
||||
|
||||
This is router-gate v4 Direction 1 ("skill-substitution laundering": много Read/Grep/Glob/LS/TodoWrite/AskUserQuestion как канал анализа ВМЕСТО invoke recommended Skill, затем mutating-инструмент без единого skill-match). The pure metering + threshold logic already exists in `tools/safe-baseline-metering.mjs` (thresholds, counters, `processEvent`, `decide`); the live I/O composition, a corrected escape, and the side-channel write-protection are missing.
|
||||
|
||||
---
|
||||
|
||||
## Owner decisions (brainstorming, revised)
|
||||
|
||||
1. **Enforcement posture:** hard-block from day 1 (not observe-only).
|
||||
2. **False-block escape (revised — override removed, G3):** invoke any real `Skill` or `EnterPlanMode` — these are never blocked by this layer (C1) and set the sticky task exemption. This escape is ALWAYS available, so a false block is a one-step recovery, not a lockout. The earlier user-approval override side-channel is **cut** as a ghost-protection (it defended a path that almost never occurs; its only durable value — runtime write-protection — is kept as a decoupled hook).
|
||||
3. **`skillMatched` definition (revised — strict, discipline-first):** matched only by a **real `Skill` OR `EnterPlanMode` tool_use** in the task (an actual tool invocation visible in the transcript). A **text mention** of a plan path does NOT count (it was a self-writable hole — removed). Sticky within the task. Optional future strengthening — match the *recommended* node specifically — is deferred because it needs a node→skill registry mapping (see "Deferred strengthening").
|
||||
4. **Priority:** discipline protection and correctness are paramount; cost and speed are secondary.
|
||||
|
||||
---
|
||||
|
||||
## Honest discipline-protection assessment
|
||||
|
||||
Read this before trusting the layer. Even fully fixed, safe-baseline is a **cheap deterministic tripwire**, not a strong scope-discipline guarantee:
|
||||
|
||||
- It fires only when a single task accumulates a high count of safe-baseline tools (Read hard = 60, Grep = 30, …) **without any real skill/plan invocation**, then reaches for a mutating tool. Realistically counters accumulate mostly *within one assistant turn* (terse user confirmations reset task boundaries), and 60 reads in one turn is uncommon — so the trigger surface is genuinely small.
|
||||
- After the fixes it has **no self-bypass** (skill-match needs a real tool_use) and a **working escape** (skill/plan invocations are never blocked, always available). That makes it *sound* — it does what it claims without a trivial dodge.
|
||||
- The **strong** scope-consistency check (is THIS tool call consistent with the declared task and recommended skill?) is **Layer 4** (`enforce-llm-judge-per-tool`), which is OFF until owner activation (item 2b). Safe-baseline is the cheap pre-filter beneath it.
|
||||
|
||||
Verdict: as a hard guarantee — **LOW–MODERATE**; as an honest, non-bypassable tripwire for blatant laundering — **sound**. The discipline lever that matters most is Layer 4.
|
||||
|
||||
---
|
||||
|
||||
## Architecture & data flow
|
||||
|
||||
`tools/enforce-safe-baseline-metering.mjs` gains a live `main()` (replacing the no-op). On each PreToolUse event:
|
||||
|
||||
1. Parse the event (`tool_name`, `session_id`, `transcript_path`).
|
||||
2. Load the per-session ledger `~/.claude/runtime/safe-baseline-ledger-<sess>.json` = `{ state, lastKeywords }` (absent on first event → `null`).
|
||||
3. From the transcript extract:
|
||||
- `promptText` — the last user prompt (`lastUserPromptText`).
|
||||
- `currentKeywords` — `extractKeywords(promptText)` (deterministic tokenization — see below; no classifier dependency).
|
||||
- `skillMatchedThisTurn` — `detectSkillMatch(lastTurnEntries(transcript))` **OR** `event.tool_name ∈ {Skill, EnterPlanMode}` (the in-flight escape call counts — see C1 fix).
|
||||
4. Call the existing pure `processEvent({ event, priorLedger, currentKeywords, promptText, skillMatched, thresholds })` — task-boundary inference (`shouldInheritTaskId`: reset-marker / keyword-overlap ≥ 2 → continuation; else fresh task, counters from zero) then metering.
|
||||
5. Sticky skill-match — **task-scoped, explicitly persisted** (the pure pipeline does NOT persist it; see "Skill-match stickiness contract"). Determine `inherit` (same predicate as `shouldInheritTaskId`), then `effectiveSkillMatched = (inherit ? priorLedger.state.skill_match_within_task : false) || skillMatchedThisTurn`; pass `effectiveSkillMatched` to `processEvent`/`decide` AND write it back into the persisted `state.skill_match_within_task`.
|
||||
6. Persist the new ledger.
|
||||
7. `hard_block` → `exitDecision({ block: true, message })` — the message MUST name the escape ("invoke the recommended Skill, or EnterPlanMode, to proceed"); `soft_flag` → append to the flags log and exit 0; `allow` → exit 0.
|
||||
|
||||
`soft_flag` never blocks (observability only). Only a mutating tool past a hard threshold without skill-match blocks.
|
||||
|
||||
### C1 fix — the escape must never be blocked
|
||||
|
||||
`Skill` and `Task` are in the pure module's MUTATING set (`safe-baseline-metering.mjs:31`), and `evaluateThresholds` hard-blocks any mutating tool past a hard threshold when `skillMatched` is false (`safe-baseline-metering.mjs:92-102`). Naively this blocks the very `Skill` call meant to escape (catch-22). The live head closes this by counting the **current event** in `skillMatchedThisTurn` when `event.tool_name ∈ {Skill, EnterPlanMode}` (step 3). Because `skillMatched` short-circuits `evaluateThresholds` to `allow` (`safe-baseline-metering.mjs:89`), a skill/plan invocation always passes — and then sets the sticky exemption for subsequent Edit/Write/Bash/Task. `Task` is intentionally NOT treated as an escape tool (subagent spawn can itself be a laundering channel) and remains blockable.
|
||||
|
||||
### Skill-match stickiness contract (V2-1 fix)
|
||||
|
||||
The pure pipeline neither persists nor task-scopes skill-match, so the wrapper MUST own it:
|
||||
|
||||
- `processEvent` returns `ledger.state = d.state` and never sets `skill_match_within_task` (`enforce-safe-baseline-metering.mjs:89-94`); `decide`/`incrementCounter` touch only `counts` (`safe-baseline-metering.mjs:42-46, 77-84`); `newCounterState` sets `skill_match_within_task: false` on a fresh task (`safe-baseline-metering.mjs:67`).
|
||||
- **Two failure modes if the wrapper is naive:** (a) *lost stickiness* — a skill invoked early in a task is forgotten next event, counters climb, a later mutating op blocks despite the skill (false block); (b) *cross-task leak* — passing `priorLedger.state.skill_match_within_task` unconditionally applies a prior task's exemption to a freshly-started task.
|
||||
- **Required wrapper logic:** compute `inherit` (replicate `shouldInheritTaskId`, or extend `processEvent` to return it); set `effectiveSkillMatched = (inherit ? priorLedger.state.skill_match_within_task : false) || skillMatchedThisTurn`; use it for the decision; and write `effectiveSkillMatched` back into the persisted `state.skill_match_within_task`. Unit tests must cover both failure modes explicitly (skill-then-60-reads stays allowed within a task; skill in task A does NOT exempt task B).
|
||||
|
||||
### Safety property of the boundary heuristic
|
||||
|
||||
The dangerous direction is *wrongly inheriting* counters across two genuinely different tasks (carrying 60 reads into an unrelated task → false block); this needs keyword-overlap ≥ 2 AND no reset marker, which is uncommon. The opposite error — treating a continuation as a fresh task — *resets* counters to zero, which only *reduces* blocking (safe direction). So the heuristic errs toward fewer false blocks.
|
||||
|
||||
---
|
||||
|
||||
## Task-boundary & skill-match detection
|
||||
|
||||
### `extractKeywords(promptText)` (pure) — H1 fix
|
||||
|
||||
Deterministic tokenization only: lowercase, strip RU/EN stopwords, keep tokens length ≥ 4, unique, sorted. **No classifier-keyword branch** — the classifier output schema (`router-classifier.mjs:420`) has no `keywords` field, and mixing extraction sources between events would break the overlap comparison. A single deterministic source guarantees `lastKeywords` (stored at event N) and `currentKeywords` (event N+1) are always comparable.
|
||||
|
||||
### `detectSkillMatch(turnEntries)` (pure) — C2 fix
|
||||
|
||||
Returns `true` only if the turn contains a real assistant **tool_use** block with `name ∈ {Skill, EnterPlanMode}`. The previous "text reference to `docs/superpowers/plans/`" path is **removed** — it was controller-authored text and therefore a self-writable bypass of the entire hard-block. A `Read` of a plan file no longer counts either (Read is itself a metered safe-baseline tool; counting it would let pure reading self-exempt). Implement like `hasPlanIndicator` in `enforce-tdd-gate.mjs` (scan assistant `tool_use` blocks); `lastTurnEntries` already skips harness-injected skill *bodies* (commit `ffd70d6f`, `isMeta` + `sourceToolUseID`), so the detector sees the genuine invocation, not the injected body. Stickiness is owned by the wrapper per the "Skill-match stickiness contract" (V2-1), combined with the in-flight escape rule in step 3 / C1.
|
||||
|
||||
### Deferred strengthening — recommended-node match
|
||||
|
||||
The strict ideal is "skill-match requires the *recommended* node, not just any skill," so the controller can't dodge by invoking an unrelated skill. This needs a node-ID (`#19`) → skill-name (`superpowers:writing-plans`) registry mapping (router-state stores `recommendedNode` per `router-classifier.mjs:163`, but not the skill name). **It is only a *partial* strengthening (V2-3):** the dominant recommendation for feature/planning/bugfix is node `#19` = the whole Superpowers plugin (`nodes.yaml:279-291`), which maps to "any `superpowers:*` skill" ≈ "any skill"; node-matching would constrain only specific skill-nodes (#62 billing-audit, #71, …), not the common case. Deferred to avoid a brittle mapping causing lockouts for marginal gain; the always-available Skill/EnterPlanMode escape and Layer 4 cover the gap meanwhile. Documented honestly so it is neither silently dropped nor oversold.
|
||||
|
||||
---
|
||||
|
||||
## Side-channel write-protection (decoupled — G3 cut)
|
||||
|
||||
The user-approval override is **removed** (G3 ghost-protection — it defended a near-never path with heavy machinery). The false-block escape is invoking any Skill/EnterPlanMode, which is always available and never blocked by this layer (C1). What remains worth building is a standalone hardening of the `~/.claude/runtime` side-channels — it has **independent value** (it closes a pre-existing self-write hole in the v4 **git-approval anchor**), so it ships as its own hook, not as a safe-baseline dependency.
|
||||
|
||||
### The write-deny hook
|
||||
|
||||
Today **no hook gates the Write/Edit tool to `~/.claude/runtime`**: `enforce-read-path-deny.mjs:22` is Read-only; `enforce-normative-content-rules.mjs:11-18` covers only normative docs; the `.claude/runtime` protection in `shell-content-rules.mjs:24` applies only to Bash/PowerShell, not the Write tool. So the existing git-approval anchor (`askuser-decisions-<sess>.jsonl`) — and any future runtime side-channel — is self-writable via the Write tool.
|
||||
|
||||
**Fix:** a new PreToolUse hook `enforce-runtime-write-deny.mjs` (matcher `Edit|Write|MultiEdit|NotebookEdit`) that blocks the Write/Edit tools from writing any path under `~/.claude/runtime/**`, fail-CLOSE. It **MUST normalize with the resolving `pathNormalize`** (`path-normalization.mjs:104,107` — `path.resolve` + `realpath` collapse `.`/`..`), **NOT** the lighter `defaultPathNormalize` (`shell-content-rules.mjs:13-19`), which leaves `.`/`..` segments intact (V2-2): `~/.claude/./runtime/x.jsonl` would evade the `\.claude/runtime` pattern while `fs` writes the real file. After resolving, match against the runtime pattern from `DEFAULT_PROTECTED_PATTERNS`. Legitimate hooks write there via Node `fs` (not the Claude Write tool), so they are unaffected. The same `.`-segment hardening should also be applied to `enforce-read-path-deny.mjs`.
|
||||
|
||||
**Owner verification:** the owner should check `.claude/settings.json` for any `permissions.deny` already covering Write to `~/.claude/**` (Claude cannot read settings.json — gate-blocked). The new hook is additive defense-in-depth regardless.
|
||||
|
||||
---
|
||||
|
||||
## Persistence, registration, testing, rollout
|
||||
|
||||
### Persistence
|
||||
|
||||
- Ledger: `~/.claude/runtime/safe-baseline-ledger-<sess>.json` = `{ state, lastKeywords }`; `state` also carries `task_id` and `skill_match_within_task`.
|
||||
- Flags log: `~/.claude/runtime/safe-baseline-flags-<sess>.jsonl` (soft_flag observability).
|
||||
- All file I/O is fail-quiet: any read/write error → treat as no-ledger and exit 0. The hook never crashes the session.
|
||||
|
||||
### Purity / testability
|
||||
|
||||
All logic lives in pure functions (`extractKeywords`, `detectSkillMatch`, plus the existing `processEvent`/`decide`). `main()` is only I/O composition. The new `enforce-runtime-write-deny.mjs` has a pure `decide({toolName, filePath})`. TDD: each new pure function RED→GREEN; an integration test drives `main()` via injected `runtimeDir` + a transcript fixture.
|
||||
|
||||
### Registration (owner-applied)
|
||||
|
||||
- `enforce-safe-baseline-metering` — PreToolUse, matcher scoped to the metered + mutating + escape tools (`Read|Grep|Glob|LS|TodoWrite|AskUserQuestion|Edit|Write|MultiEdit|NotebookEdit|Bash|Skill|Task|EnterPlanMode`), block mode.
|
||||
- `enforce-runtime-write-deny` — PreToolUse `Edit|Write|MultiEdit|NotebookEdit`, block mode (standalone — protects the git-approval anchor; independent of safe-baseline).
|
||||
- **Claude does not edit `settings.json`** (gate-blocked). The plan produces an exact JSON block for the owner to paste manually. Until registered, the hooks are inert (no behavior change).
|
||||
|
||||
### Rollout safety
|
||||
|
||||
Despite "hard-block from day 1", the plan includes a **mandatory smoke test before live registration**: run the live `main()` against 3 real transcript fixtures (single task / task switch / skill-invocation escape) and confirm boundary, skillMatched, and escape all fire correctly. Plus a smoke for `enforce-runtime-write-deny`: a Write to `~/.claude/runtime/x.jsonl` is blocked, a Write to `~/.claude/./runtime/x.jsonl` (V2-2 `.`-segment evasion) is ALSO blocked, and a Write to a normal project path passes. This does not change the posture; it catches gross detection bugs before the hooks start blocking.
|
||||
|
||||
### Scope
|
||||
|
||||
~7-9 TDD tasks (live `main()` + `extractKeywords` + `detectSkillMatch` + stickiness contract + escape fix; plus the standalone `enforce-runtime-write-deny` hook), estimate 5-7 h. Cost/speed are secondary per owner priority.
|
||||
|
||||
---
|
||||
|
||||
## Out of scope
|
||||
|
||||
- User-approval override side-channel (cut as a ghost-protection, G3 — escape via Skill/EnterPlanMode is always available).
|
||||
- Layer 4 LLM-judge activation (separate owner step, item 2b) — the strong scope-discipline lever.
|
||||
- Recommended-node skill matching (deferred strengthening — needs node→skill registry).
|
||||
- CLAUDE.md / Pravila / PSR / Tooling normative sync (blocked by a parallel session, item 4).
|
||||
- Layer 5 VM / biometric / YubiKey (item 6).
|
||||
- Any weakening of the router-gate whitelist.
|
||||
@@ -34,6 +34,22 @@ export function isSimpleAB(questions) {
|
||||
);
|
||||
}
|
||||
|
||||
// Calibration 5 (2026-05-31) — git-operation APPROVAL prompts are the sanctioned
|
||||
// git-approval channel (enforce-askuser-answer-parser turns the chosen answer
|
||||
// into an approve_git_operation record), never a substitute for structured
|
||||
// ideation. They must NOT be treated as cosmetic A/B. Identified structurally:
|
||||
// an option label is a literal git command. (SCOPE fix, not a discipline drop —
|
||||
// see decide(): design A/B questions with non-git labels are unaffected.)
|
||||
const GIT_CMD_RE = /\bgit\s+(?:commit|push|add|pull|merge|rebase|reset|checkout|switch|branch|stash|cherry-pick|revert|clean|restore|fetch|tag)\b/i;
|
||||
|
||||
/** True if this AskUser is a git-operation approval prompt (an option label is a git command). */
|
||||
export function isGitApprovalQuestion(questions) {
|
||||
if (!Array.isArray(questions)) return false;
|
||||
return questions.some((q) =>
|
||||
q && Array.isArray(q.options) &&
|
||||
q.options.some((o) => o && typeof o.label === 'string' && GIT_CMD_RE.test(o.label)));
|
||||
}
|
||||
|
||||
/**
|
||||
* Pure cosmetic-AskUser decision (v4.1 §4.5).
|
||||
* Caller passes PRIOR counts; decide computes prospective new counts.
|
||||
@@ -42,6 +58,13 @@ export function isSimpleAB(questions) {
|
||||
* @returns {{action:'allow'|'soft_flag'|'hard_block', block:boolean, reason:string|null, isSimpleAB:boolean, newSessionCount:number, newTurnCount:number}}
|
||||
*/
|
||||
export function decide({ questions, simpleCountSession = 0, simpleCountTurn = 0, skillMatchedThisTurn = false, brainstormingInvoked = false }) {
|
||||
// Calibration 5: git-operation approval prompts are exempt — the sanctioned
|
||||
// git-approval channel, never cosmetic ideation. Allow, do not count, never
|
||||
// block. (Cannot be abused to dodge ideation discipline: a git-command label
|
||||
// makes the answer a real approve_git_operation, not a cosmetic clarification.)
|
||||
if (isGitApprovalQuestion(questions)) {
|
||||
return { action: 'allow', block: false, reason: null, isSimpleAB: false, newSessionCount: simpleCountSession, newTurnCount: simpleCountTurn };
|
||||
}
|
||||
const simple = isSimpleAB(questions);
|
||||
const newSessionCount = simpleCountSession + (simple ? 1 : 0);
|
||||
const newTurnCount = simpleCountTurn + (simple ? 1 : 0);
|
||||
|
||||
@@ -92,3 +92,45 @@ describe('askuser-cosmetic-detector / transcript helpers', () => {
|
||||
expect(countSimpleSession(flags)).toBe(2);
|
||||
});
|
||||
});
|
||||
|
||||
import { isGitApprovalQuestion } from './askuser-cosmetic-detector.mjs';
|
||||
|
||||
// Calibration 5 (2026-05-31, SCOPE fix, NOT a discipline drop): a git-operation
|
||||
// APPROVAL AskUser (an option label is a literal git command) is the sanctioned
|
||||
// git-approval channel — enforce-askuser-answer-parser turns the chosen answer
|
||||
// into an approve_git_operation record. It is never a substitute for structured
|
||||
// ideation, so it must not be counted/blocked as "cosmetic A/B". Design A/B
|
||||
// questions (non-git labels) are unchanged — still counted, still hard-blocked.
|
||||
describe('isGitApprovalQuestion (calibration 5)', () => {
|
||||
it('true when an option label is a git command (push)', () => {
|
||||
expect(isGitApprovalQuestion([{ options: [{ label: 'git push origin main' }, { label: 'Не пушить' }] }])).toBe(true);
|
||||
});
|
||||
it('true when an option label is a git command (commit with pathspec)', () => {
|
||||
expect(isGitApprovalQuestion([{ options: [{ label: 'git commit -F x.txt -- a.mjs b.mjs' }, { label: 'Отмена' }] }])).toBe(true);
|
||||
});
|
||||
it('false for a non-git A/B', () => {
|
||||
expect(isGitApprovalQuestion([{ options: [{ label: 'Вариант А' }, { label: 'Вариант Б' }] }])).toBe(false);
|
||||
});
|
||||
it('false for empty/invalid input', () => {
|
||||
expect(isGitApprovalQuestion(null)).toBe(false);
|
||||
expect(isGitApprovalQuestion([])).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('decide — git-approval exemption (calibration 5)', () => {
|
||||
const gitQ = { question: 'Подтверди?', options: [{ label: 'git push origin main' }, { label: 'Не пушить' }] };
|
||||
|
||||
it('allows a git-approval question and does NOT count it even past the session limit', () => {
|
||||
const r = decide({ questions: [gitQ], simpleCountSession: 5, simpleCountTurn: 0, skillMatchedThisTurn: false, brainstormingInvoked: false });
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.action).toBe('allow');
|
||||
expect(r.isSimpleAB).toBe(false);
|
||||
expect(r.newSessionCount).toBe(5); // unchanged — not counted toward the cosmetic limit
|
||||
});
|
||||
|
||||
it('REGRESSION: a non-git simple A/B past the limit STILL hard-blocks (discipline intact)', () => {
|
||||
const r = decide({ questions: [simpleQ], simpleCountSession: 5, simpleCountTurn: 0, skillMatchedThisTurn: false, brainstormingInvoked: false });
|
||||
expect(r.action).toBe('hard_block');
|
||||
expect(r.block).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -26,6 +26,7 @@ import {
|
||||
lastAssistantText,
|
||||
parseCoverageLine,
|
||||
turnToolUses,
|
||||
sessionToolUses,
|
||||
findOverride,
|
||||
logOverride,
|
||||
exitDecision,
|
||||
@@ -38,7 +39,7 @@ const MUTATING_TOOLS = new Set([
|
||||
]);
|
||||
|
||||
export function decide({
|
||||
toolUses, assistantText, override,
|
||||
toolUses, assistantText, override, priorSkillNames = [],
|
||||
}) {
|
||||
// Pure conversational turn — skip.
|
||||
const hasMutating = toolUses.some((u) => MUTATING_TOOLS.has(u.name));
|
||||
@@ -54,19 +55,24 @@ export function decide({
|
||||
`Add as first line of next response:`,
|
||||
` coverage: skill:<name> (e.g., skill:superpowers:test-driven-development)`,
|
||||
` coverage: direct:<role> (e.g., direct:memory-sync, direct:git-recovery)`,
|
||||
``,
|
||||
`Override: include "без скилов" or "direct ok" in your prompt.`,
|
||||
].join('\n'),
|
||||
};
|
||||
}
|
||||
|
||||
if (cov.channel === 'skill') {
|
||||
const found = toolUses.some((u) => u.name === 'Skill' && u.input && (u.input.skill === cov.id || u.input.skill === cov.id.replace(/^superpowers:/, '')));
|
||||
if (!found) {
|
||||
// Accept if the skill was invoked in THIS turn OR anywhere earlier in this
|
||||
// session (item G): a skill chosen in a prior turn stays active, so an honest
|
||||
// skill:X line on a continuation turn must not be punished into under-reporting.
|
||||
// Still unforgeable — a real Skill tool_use must exist in the transcript.
|
||||
const norm = (s) => String(s || '').replace(/^superpowers:/, '');
|
||||
const idNorm = norm(cov.id);
|
||||
const foundThisTurn = toolUses.some((u) => u.name === 'Skill' && u.input && norm(u.input.skill) === idNorm);
|
||||
const foundPrior = (priorSkillNames || []).some((n) => norm(n) === idNorm);
|
||||
if (!foundThisTurn && !foundPrior) {
|
||||
return {
|
||||
block: true,
|
||||
message: [
|
||||
`[enforce-coverage-verify] coverage says skill:${cov.id} but the Skill tool was never invoked with that name in this turn.`,
|
||||
`[enforce-coverage-verify] coverage says skill:${cov.id} but the Skill tool was never invoked with that name in this turn or any prior turn of this session.`,
|
||||
`Either invoke the skill via Skill tool, or switch coverage to direct:<role> with justification.`,
|
||||
].join('\n'),
|
||||
};
|
||||
@@ -89,8 +95,13 @@ async function main() {
|
||||
|
||||
const toolUses = turnToolUses(transcript);
|
||||
const assistantText = lastAssistantText(transcript);
|
||||
// Session-wide Skill invocations (item G): a skill chosen in a prior turn is
|
||||
// still active and may legitimately be named in this turn's coverage line.
|
||||
const priorSkillNames = sessionToolUses(transcript)
|
||||
.filter((u) => u.name === 'Skill' && u.input && u.input.skill)
|
||||
.map((u) => u.input.skill);
|
||||
|
||||
const result = decide({ toolUses, assistantText, override });
|
||||
const result = decide({ toolUses, assistantText, override, priorSkillNames });
|
||||
exitDecision(result);
|
||||
} catch {
|
||||
exitDecision({ block: false });
|
||||
|
||||
@@ -1,6 +1,40 @@
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { decide } from './enforce-coverage-verify.mjs';
|
||||
|
||||
// Cross-turn skill credit (backlog item G, 2026-05-31): a skill chosen in a PRIOR
|
||||
// turn stays active; an honest `skill:X` line on a continuation turn must NOT be
|
||||
// blocked just because the Skill tool was not re-invoked this turn. decide() takes
|
||||
// priorSkillNames (real Skill tool_uses from earlier in the session transcript).
|
||||
describe('enforce-coverage-verify / decide — cross-turn active skill (enforce-coverage-verify.mjs)', () => {
|
||||
it('credits skill:X when X was invoked in a PRIOR turn (priorSkillNames)', () => {
|
||||
const r = decide({
|
||||
toolUses: [{ name: 'Edit', input: { file_path: 'foo.mjs' } }],
|
||||
assistantText: 'coverage: skill:superpowers:test-driven-development\nработаю',
|
||||
priorSkillNames: ['superpowers:test-driven-development'],
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
});
|
||||
|
||||
it('normalizes the superpowers: prefix for prior-turn skills too', () => {
|
||||
const r = decide({
|
||||
toolUses: [{ name: 'Edit', input: { file_path: 'foo.mjs' } }],
|
||||
assistantText: 'coverage: skill:superpowers:test-driven-development',
|
||||
priorSkillNames: ['test-driven-development'],
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
});
|
||||
|
||||
it('still blocks skill:X when X is neither in this turn nor any prior turn', () => {
|
||||
const r = decide({
|
||||
toolUses: [{ name: 'Edit', input: { file_path: 'foo.mjs' } }],
|
||||
assistantText: 'coverage: skill:superpowers:test-driven-development',
|
||||
priorSkillNames: ['some-other-skill'],
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.message).toMatch(/never invoked/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('enforce-coverage-verify / decide', () => {
|
||||
it('allows turn with no mutating tools (pure conversational)', () => {
|
||||
const r = decide({ toolUses: [{ name: 'Read', input: {} }], assistantText: 'just talking' });
|
||||
@@ -14,6 +48,9 @@ describe('enforce-coverage-verify / decide', () => {
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.message).toMatch(/no.*coverage/);
|
||||
// 1A (2026-05-31): не рекламировать мёртвые override-фразы (findOverride — заглушка v4).
|
||||
expect(r.message).not.toMatch(/Override:/);
|
||||
expect(r.message).not.toMatch(/без скилов|direct ok/);
|
||||
});
|
||||
|
||||
it('blocks when coverage says skill but Skill tool not invoked', () => {
|
||||
|
||||
@@ -0,0 +1,177 @@
|
||||
#!/usr/bin/env node
|
||||
/**
|
||||
* enforce-llm-judge-per-tool — PreToolUse wrapper around the pure
|
||||
* llm-judge-per-tool engine (router-gate v4.1 §4.7 Layer 4).
|
||||
*
|
||||
* The engine (llm-judge-per-tool.mjs) asks a single Sonnet judge whether a
|
||||
* mutating tool call is consistent with the declared user task + recommended
|
||||
* skill scope (NO / doubt → block). Running it costs real LLM money, so the
|
||||
* judge MUST stay OFF until the owner deliberately activates Layer 4. This
|
||||
* wrapper is the missing seam between the engine and settings.json, built — like
|
||||
* the sibling Stream H wrappers (enforce-safe-baseline-metering / -decomposition-
|
||||
* detector) — with a testable pure `decide()` and a DELIBERATE no-op `main()`.
|
||||
*
|
||||
* Activation (step 2b — owner-driven, NOT done here):
|
||||
* 1. store the API key (keychain `router-gate-llm-judge`/`default` or ROUTER_LLM_KEY),
|
||||
* 2. set ROUTER_LLM_JUDGE_ENABLED=1,
|
||||
* 3. register this hook (PreToolUse, block) in .claude/settings.json.
|
||||
* Until all three, decide() short-circuits to allow on a disabled config and the
|
||||
* live main() is a no-op (exit 0) — $0, no LLM call, no self-lockout.
|
||||
*/
|
||||
import { judgePerTool, MUTATING_TOOLS, readDeclaredTask, resolveEffectiveTask } from './llm-judge-per-tool.mjs';
|
||||
import { resolveJudgeConfig } from './llm-judge-config.mjs';
|
||||
import { readJudgeBudget, bumpJudgeBudget, JUDGE_SESSION_BUDGET, llmJudgeCall } from './llm-judge.mjs';
|
||||
import { readStdin, parseEventJson, exitDecision, readTranscript, lastUserPromptText } from './enforce-hook-helpers.mjs';
|
||||
import { classifyBashCommand } from './enforce-router-gate.mjs';
|
||||
|
||||
/**
|
||||
* Pure decision. Composes the Layer-4 enabling-gate (resolveJudgeConfig output)
|
||||
* with the per-tool judge engine:
|
||||
* - non-mutating tool → allow (out of judge scope)
|
||||
* - judge disabled / no key → allow + degraded flag (Layer 4 off, $0)
|
||||
* - judge enabled → delegate to judgePerTool (YES → allow; NO / doubt → block)
|
||||
*
|
||||
* @param {object} args
|
||||
* @param {object} args.event - PreToolUse event ({ tool_name, tool_input })
|
||||
* @param {{enabled:boolean, apiKey:?string}} args.judgeConfig - resolveJudgeConfig() output
|
||||
* @param {object} [args.declaredTask] - { task_summary, recommended_node, recommended_chain }
|
||||
* @param {object} [args.budgetState] - { spent, limit } per-session judge budget
|
||||
* @param {Function} [args.llmJudgeCallImpl] - injected single-judge caller (tests / real binding)
|
||||
* @returns {Promise<{block:boolean, reason?:string, degraded?:boolean, verdict?:string|null}>}
|
||||
*/
|
||||
export async function decide({
|
||||
event,
|
||||
judgeConfig,
|
||||
declaredTask = {},
|
||||
budgetState,
|
||||
llmJudgeCallImpl,
|
||||
}) {
|
||||
const toolName = event && event.tool_name;
|
||||
if (!MUTATING_TOOLS.has(toolName)) {
|
||||
return { block: false, reason: 'non-mutating tool — outside per-tool judge scope' };
|
||||
}
|
||||
if (!judgeConfig || !judgeConfig.enabled) {
|
||||
return { block: false, degraded: true, reason: 'Layer 4 judge disabled' };
|
||||
}
|
||||
return judgePerTool({
|
||||
toolName,
|
||||
toolInput: (event && event.tool_input) || {},
|
||||
declaredTask,
|
||||
apiKey: judgeConfig.apiKey,
|
||||
budgetState,
|
||||
llmJudgeCallImpl,
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Testable wiring core. Composes resolveJudgeConfig output + decide(); bumps the
|
||||
* session budget ONLY when a real judge call was made (result carries a verdict).
|
||||
* No verdict ⇒ non-mutating / disabled / no-key / budget-exhausted ⇒ no spend.
|
||||
*/
|
||||
/**
|
||||
* Calibration 2026-05-31 (SCOPE fix, NOT a discipline drop): readonly Bash
|
||||
* commands ("смотрелки" — git status/log/diff, cat, grep, ls) change nothing,
|
||||
* so they are outside the "judge on mutating tools" scope. Reuse the router-gate
|
||||
* Bash classifier: an allow-verdict whose reason mentions readonly/reading is a
|
||||
* no-state-change command. Everything that can mutate (file edits, git
|
||||
* commit/push, dangerous Bash, Skill/Task) is unaffected — doubt→block stands.
|
||||
*/
|
||||
export function isReadonlyBashEvent(event) {
|
||||
if (!event || event.tool_name !== 'Bash') return false;
|
||||
const command = (event.tool_input && event.tool_input.command) || '';
|
||||
if (!command) return false;
|
||||
try {
|
||||
const c = classifyBashCommand(command, {});
|
||||
return !!c && c.result === 'allow' && /readonly|reading/i.test(c.reason || '');
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Calibration 3 (2026-05-31, SCOPE fix, NOT a discipline drop): a test run
|
||||
* (vitest / pest / phpunit / php artisan test / composer test / npm test) only
|
||||
* inspects the code and reports pass/fail — it mutates no protected state, and
|
||||
* running tests is a MANDATORY step of TDD which the rules require. Treat such
|
||||
* commands like readonly Bash: outside the mutating-tool judge scope. A command
|
||||
* that chains to anything else (&& / ; / | / backtick / $( ) is NOT exempt and
|
||||
* stays judged — the exemption covers a pure test invocation only.
|
||||
*/
|
||||
const TEST_RUNNER_RE =
|
||||
/^(?:npx\s+)?vitest(?:\s|$)|^(?:\.\/)?(?:node_modules\/\.bin\/|vendor\/bin\/)?pest(?:\s|$)|^(?:\.\/)?vendor\/bin\/phpunit(?:\s|$)|^php\s+artisan\s+test(?:\s|$|:)|^composer\s+test(?::\S+)?(?:\s|$)|^npm\s+(?:run\s+)?test(?::\S+)?(?:\s|$)/i;
|
||||
|
||||
export function isTestRunnerBashEvent(event) {
|
||||
if (!event || event.tool_name !== 'Bash') return false;
|
||||
const command = ((event.tool_input && event.tool_input.command) || '').trim();
|
||||
if (!command) return false;
|
||||
// Exemption is for a pure test run only — reject anything chaining to another command.
|
||||
if (/[;&|`]/.test(command) || command.includes('$(')) return false;
|
||||
return TEST_RUNNER_RE.test(command);
|
||||
}
|
||||
|
||||
export async function runPerTool({
|
||||
event,
|
||||
judgeConfig,
|
||||
readDeclaredTaskImpl,
|
||||
readLastUserPromptImpl,
|
||||
readBudgetImpl,
|
||||
bumpBudgetImpl,
|
||||
llmJudgeCallImpl,
|
||||
sessionBudget = JUDGE_SESSION_BUDGET,
|
||||
}) {
|
||||
// Readonly Bash never mutates → outside the judge's scope; skip (no LLM call, no spend).
|
||||
if (isReadonlyBashEvent(event)) {
|
||||
return { block: false, reason: 'readonly bash — outside mutating-tool judge scope (calibration 2026-05-31)' };
|
||||
}
|
||||
// Test-runner Bash only inspects + reports; mandatory TDD step → outside scope (calibration 3).
|
||||
if (isTestRunnerBashEvent(event)) {
|
||||
return { block: false, reason: 'test-runner bash — outside mutating-tool judge scope (calibration 3, 2026-05-31)' };
|
||||
}
|
||||
const sessionId = event && event.session_id;
|
||||
const declaredTask = readDeclaredTaskImpl({ sessionId });
|
||||
// Calibration 4 (soft): only when the classifier summary is unknown/empty,
|
||||
// consult the user's actual last prompt and judge against that instead.
|
||||
let effectiveTask = declaredTask;
|
||||
const summary = declaredTask && declaredTask.task_summary;
|
||||
const summaryUnknown = !summary || summary === '(unknown)' || !String(summary).trim();
|
||||
if (summaryUnknown && typeof readLastUserPromptImpl === 'function') {
|
||||
const lastPrompt = readLastUserPromptImpl({ transcriptPath: event && event.transcript_path });
|
||||
effectiveTask = resolveEffectiveTask(declaredTask, lastPrompt);
|
||||
}
|
||||
const spent = readBudgetImpl({ sessionId });
|
||||
const result = await decide({
|
||||
event,
|
||||
judgeConfig,
|
||||
declaredTask: effectiveTask,
|
||||
budgetState: { spent, limit: sessionBudget },
|
||||
llmJudgeCallImpl,
|
||||
});
|
||||
if (result.verdict !== undefined) bumpBudgetImpl({ sessionId, by: 1 });
|
||||
return result;
|
||||
}
|
||||
|
||||
async function main() {
|
||||
// Live wiring (2b): spend is gated by resolveJudgeConfig (flag AND key). With
|
||||
// the flag off or no key, decide() short-circuits to a degraded allow — NO LLM
|
||||
// call, $0. Fail-quiet so a judge bug can never wedge the session.
|
||||
try {
|
||||
const event = parseEventJson(await readStdin());
|
||||
const judgeConfig = resolveJudgeConfig();
|
||||
const result = await runPerTool({
|
||||
event,
|
||||
judgeConfig,
|
||||
readDeclaredTaskImpl: readDeclaredTask,
|
||||
readLastUserPromptImpl: ({ transcriptPath }) => lastUserPromptText(readTranscript(transcriptPath)),
|
||||
readBudgetImpl: readJudgeBudget,
|
||||
bumpBudgetImpl: bumpJudgeBudget,
|
||||
llmJudgeCallImpl: (opts) => llmJudgeCall(opts),
|
||||
});
|
||||
exitDecision({ block: result.block, message: result.reason });
|
||||
} catch {
|
||||
exitDecision({ block: false });
|
||||
}
|
||||
}
|
||||
|
||||
if ((process.argv[1] || '').replace(/\\/g, '/').endsWith('/enforce-llm-judge-per-tool.mjs')) {
|
||||
main().catch(() => process.exit(0));
|
||||
}
|
||||
@@ -0,0 +1,357 @@
|
||||
// tools/enforce-llm-judge-per-tool.test.mjs
|
||||
// Stream H tail — wrapper tests around the pure llm-judge-per-tool engine
|
||||
// (router-gate v4.1 §4.7 Layer 4). Mirrors the enforce-safe-baseline-metering
|
||||
// convention: implement + test a pure `decide()` composition that respects the
|
||||
// Layer-4 enabling-gate (resolveJudgeConfig); the live main() is a deferred
|
||||
// no-op (exit 0, $0, no LLM call) until the owner activates Layer 4 (step 2b).
|
||||
// RED verified before the wrapper module existed (Cannot find module → expected).
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { decide } from './enforce-llm-judge-per-tool.mjs';
|
||||
|
||||
function spyCall(verdict) {
|
||||
const calls = [];
|
||||
const impl = async (opts) => { calls.push(opts); return verdict; };
|
||||
return { impl, calls };
|
||||
}
|
||||
|
||||
const ON = { enabled: true, apiKey: 'k' };
|
||||
const OFF = { enabled: false, apiKey: null };
|
||||
|
||||
describe('enforce-llm-judge-per-tool decide()', () => {
|
||||
it('allows a non-mutating tool without consulting the judge', async () => {
|
||||
const { impl, calls } = spyCall('NO');
|
||||
const r = await decide({
|
||||
event: { tool_name: 'WebFetch' },
|
||||
judgeConfig: ON,
|
||||
llmJudgeCallImpl: impl,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.reason).toMatch(/non-mutating/i);
|
||||
expect(calls.length).toBe(0);
|
||||
});
|
||||
|
||||
// Calibration 1 (2026-05-31) — Skill is out of judge scope; invoking it
|
||||
// mutates nothing and is the prescribed §17 entry into work.
|
||||
it('allows a Skill invocation without consulting the judge (calibration 1)', async () => {
|
||||
const { impl, calls } = spyCall('NO');
|
||||
const r = await decide({
|
||||
event: { tool_name: 'Skill', tool_input: { skill: 'superpowers:test-driven-development' } },
|
||||
judgeConfig: ON,
|
||||
llmJudgeCallImpl: impl,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.reason).toMatch(/non-mutating/i);
|
||||
expect(calls.length).toBe(0);
|
||||
});
|
||||
|
||||
it('allows a mutating tool without consulting the judge when Layer 4 is disabled ($0 posture)', async () => {
|
||||
const { impl, calls } = spyCall('NO');
|
||||
const r = await decide({
|
||||
event: { tool_name: 'Edit' },
|
||||
judgeConfig: OFF,
|
||||
llmJudgeCallImpl: impl,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.degraded).toBe(true);
|
||||
expect(calls.length).toBe(0);
|
||||
});
|
||||
|
||||
it('allows a mutating tool when an enabled judge returns YES (consistent)', async () => {
|
||||
const { impl } = spyCall('YES');
|
||||
const r = await decide({
|
||||
event: { tool_name: 'Edit', tool_input: { file_path: 'x' } },
|
||||
judgeConfig: ON,
|
||||
declaredTask: { task_summary: 't', recommended_node: '#19' },
|
||||
llmJudgeCallImpl: impl,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.verdict).toBe('YES');
|
||||
});
|
||||
|
||||
it('blocks a mutating tool when an enabled judge returns NO (off-scope)', async () => {
|
||||
const { impl } = spyCall('NO');
|
||||
const r = await decide({
|
||||
event: { tool_name: 'Write', tool_input: {} },
|
||||
judgeConfig: ON,
|
||||
llmJudgeCallImpl: impl,
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.reason).toMatch(/off-scope|per-tool/i);
|
||||
});
|
||||
|
||||
it('blocks on doubt — a null verdict is treated as inconsistent', async () => {
|
||||
const { impl } = spyCall(null);
|
||||
const r = await decide({
|
||||
event: { tool_name: 'Bash', tool_input: { command: 'ls' } },
|
||||
judgeConfig: ON,
|
||||
llmJudgeCallImpl: impl,
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
});
|
||||
|
||||
it('degrades to allow (no block) when the session judge budget is exhausted', async () => {
|
||||
const { impl, calls } = spyCall('NO');
|
||||
const r = await decide({
|
||||
event: { tool_name: 'Edit', tool_input: {} },
|
||||
judgeConfig: ON,
|
||||
budgetState: { spent: 10, limit: 10 },
|
||||
llmJudgeCallImpl: impl,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.degraded).toBe(true);
|
||||
expect(calls.length).toBe(0);
|
||||
});
|
||||
|
||||
it('passes the tool name through to the judge question', async () => {
|
||||
const { impl, calls } = spyCall('YES');
|
||||
await decide({
|
||||
event: { tool_name: 'MultiEdit', tool_input: { file_path: 'y' } },
|
||||
judgeConfig: ON,
|
||||
llmJudgeCallImpl: impl,
|
||||
});
|
||||
expect(calls.length).toBe(1);
|
||||
expect(calls[0].question).toContain('MultiEdit');
|
||||
});
|
||||
});
|
||||
|
||||
import { runPerTool } from './enforce-llm-judge-per-tool.mjs';
|
||||
|
||||
describe('runPerTool — spend-gate + budget binding (live wiring 2b)', () => {
|
||||
const deps = (over = {}) => ({
|
||||
readDeclaredTaskImpl: () => ({ task_summary: 't', recommended_node: null, recommended_chain: [] }),
|
||||
readBudgetImpl: () => 0,
|
||||
bumpBudgetImpl: () => {},
|
||||
sessionBudget: 200,
|
||||
...over,
|
||||
});
|
||||
|
||||
it('disabled config + mutating tool → degraded allow, NO budget bump, NO llm call', async () => {
|
||||
let bumped = 0; let called = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
|
||||
judgeConfig: { enabled: false, apiKey: null },
|
||||
llmJudgeCallImpl: () => { called++; return 'NO'; },
|
||||
...deps({ bumpBudgetImpl: () => { bumped++; } }),
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.degraded).toBe(true);
|
||||
expect(called).toBe(0);
|
||||
expect(bumped).toBe(0);
|
||||
});
|
||||
|
||||
it('enabled + mutating + judge YES → allow, budget bumped once', async () => {
|
||||
let bumped = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
llmJudgeCallImpl: async () => 'YES',
|
||||
...deps({ bumpBudgetImpl: () => { bumped++; } }),
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.verdict).toBe('YES');
|
||||
expect(bumped).toBe(1);
|
||||
});
|
||||
|
||||
it('enabled + mutating + judge NO → block, budget bumped once', async () => {
|
||||
let bumped = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Bash', tool_input: { command: 'x' }, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
llmJudgeCallImpl: async () => 'NO',
|
||||
...deps({ bumpBudgetImpl: () => { bumped++; } }),
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.verdict).toBe('NO');
|
||||
expect(bumped).toBe(1);
|
||||
});
|
||||
|
||||
it('non-mutating tool → allow, NO call, NO bump', async () => {
|
||||
let bumped = 0; let called = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Read', tool_input: {}, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
llmJudgeCallImpl: () => { called++; return 'NO'; },
|
||||
...deps({ bumpBudgetImpl: () => { bumped++; } }),
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(called).toBe(0);
|
||||
expect(bumped).toBe(0);
|
||||
});
|
||||
|
||||
it('enabled but budget exhausted → degraded allow, NO bump', async () => {
|
||||
let bumped = 0; let called = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
llmJudgeCallImpl: () => { called++; return 'NO'; },
|
||||
...deps({ readBudgetImpl: () => 200, bumpBudgetImpl: () => { bumped++; } }),
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.degraded).toBe(true);
|
||||
expect(called).toBe(0);
|
||||
expect(bumped).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
import { isReadonlyBashEvent } from './enforce-llm-judge-per-tool.mjs';
|
||||
|
||||
// Calibration 2026-05-31 — SCOPE fix only, discipline NOT lowered.
|
||||
// The per-tool judge is "judge on MUTATING tools"; readonly Bash ("смотрелки"
|
||||
// — git status/log/diff, cat, grep, ls) change nothing, so they were friction
|
||||
// with zero discipline value. We exclude them from the judge. The doubt→block
|
||||
// rule and full judging of every state-changing action (Edit/Write/commit/push/
|
||||
// Skill/Task) are UNCHANGED.
|
||||
describe('isReadonlyBashEvent — readonly Bash exclusion (calibration, no discipline drop)', () => {
|
||||
it.each([
|
||||
'git status',
|
||||
'git status --short',
|
||||
'git log -1 --oneline',
|
||||
'git diff HEAD~1',
|
||||
'cat package.json',
|
||||
'grep -n foo bar.js',
|
||||
'ls -la',
|
||||
])('treats readonly command as out-of-judge-scope: %s', (command) => {
|
||||
expect(isReadonlyBashEvent({ tool_name: 'Bash', tool_input: { command } })).toBe(true);
|
||||
});
|
||||
|
||||
it.each([
|
||||
'git commit -m "x"',
|
||||
'git push origin main',
|
||||
'rm -rf foo',
|
||||
])('does NOT treat a mutating/blocked command as readonly: %s', (command) => {
|
||||
expect(isReadonlyBashEvent({ tool_name: 'Bash', tool_input: { command } })).toBe(false);
|
||||
});
|
||||
|
||||
it('non-Bash tool is never readonly-bash', () => {
|
||||
expect(isReadonlyBashEvent({ tool_name: 'Edit', tool_input: { file_path: 'x' } })).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('runPerTool — readonly Bash skips the judge; mutating Bash still judged', () => {
|
||||
it('readonly Bash → allow WITHOUT consulting judge even when enabled (no spend)', async () => {
|
||||
let called = 0; let bumped = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Bash', tool_input: { command: 'git status' }, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
readDeclaredTaskImpl: () => ({ task_summary: 't' }),
|
||||
readBudgetImpl: () => 0,
|
||||
bumpBudgetImpl: () => { bumped++; },
|
||||
llmJudgeCallImpl: () => { called++; return 'NO'; },
|
||||
sessionBudget: 200,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(called).toBe(0);
|
||||
expect(bumped).toBe(0);
|
||||
});
|
||||
|
||||
it('mutating Bash (git commit) STILL judged when enabled — discipline preserved', async () => {
|
||||
let called = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Bash', tool_input: { command: 'git commit -m "x"' }, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
readDeclaredTaskImpl: () => ({ task_summary: 't' }),
|
||||
readBudgetImpl: () => 0,
|
||||
bumpBudgetImpl: () => {},
|
||||
llmJudgeCallImpl: async () => { called++; return 'NO'; },
|
||||
sessionBudget: 200,
|
||||
});
|
||||
expect(called).toBe(1);
|
||||
expect(r.block).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
import { isTestRunnerBashEvent } from './enforce-llm-judge-per-tool.mjs';
|
||||
|
||||
// Calibration 3 (2026-05-31) — SCOPE fix, discipline NOT lowered.
|
||||
// A test run (vitest / pest / composer test / php artisan test) only inspects
|
||||
// the code and reports pass/fail — it mutates no protected state. It is also a
|
||||
// mandatory step of TDD, which the rules require. Treat recognised test-runner
|
||||
// commands like readonly Bash: out of judge scope. Anything that chains to a
|
||||
// mutation (&& / ; / |) is NOT exempt and stays judged.
|
||||
describe('isTestRunnerBashEvent — test-runner exclusion (calibration 3, no discipline drop)', () => {
|
||||
it.each([
|
||||
'npx vitest run --root app --config vitest.config.tools.mjs',
|
||||
'vitest run',
|
||||
'pest',
|
||||
'./vendor/bin/pest --parallel',
|
||||
'vendor/bin/pest',
|
||||
'php artisan test',
|
||||
'composer test',
|
||||
'npm run test:tools',
|
||||
'npm test',
|
||||
])('treats test-runner command as out-of-judge-scope: %s', (command) => {
|
||||
expect(isTestRunnerBashEvent({ tool_name: 'Bash', tool_input: { command } })).toBe(true);
|
||||
});
|
||||
|
||||
it.each([
|
||||
'git commit -m "x"',
|
||||
'rm -rf foo',
|
||||
'pest && git push origin main', // chained to a mutation → NOT exempt
|
||||
'echo pest',
|
||||
'composer require evil/package', // not a test run
|
||||
])('does NOT treat non-test-runner / chained command as test-runner: %s', (command) => {
|
||||
expect(isTestRunnerBashEvent({ tool_name: 'Bash', tool_input: { command } })).toBe(false);
|
||||
});
|
||||
|
||||
it('non-Bash tool is never test-runner-bash', () => {
|
||||
expect(isTestRunnerBashEvent({ tool_name: 'Edit', tool_input: { file_path: 'x' } })).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('runPerTool — test-runner Bash skips the judge; mutating Bash still judged', () => {
|
||||
it('test-runner Bash → allow WITHOUT consulting judge even when enabled (no spend)', async () => {
|
||||
let called = 0; let bumped = 0;
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Bash', tool_input: { command: 'npx vitest run' }, session_id: 's' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
readDeclaredTaskImpl: () => ({ task_summary: 't' }),
|
||||
readBudgetImpl: () => 0,
|
||||
bumpBudgetImpl: () => { bumped++; },
|
||||
llmJudgeCallImpl: () => { called++; return 'NO'; },
|
||||
sessionBudget: 200,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(called).toBe(0);
|
||||
expect(bumped).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
// Calibration 4 (soft, 2026-05-31): when the classifier summary is "(unknown)",
|
||||
// runPerTool reads the user's last prompt and judges against THAT (better
|
||||
// evidence) instead of an empty task. When the summary is meaningful, the
|
||||
// user-prompt reader is never consulted — behaviour unchanged.
|
||||
describe('runPerTool — calibration 4 soft user-prompt fallback', () => {
|
||||
it('uses the user prompt as the judged task when classifier summary is unknown', async () => {
|
||||
const calls = [];
|
||||
const r = await runPerTool({
|
||||
event: { tool_name: 'Edit', tool_input: { file_path: 'tools/x.mjs' }, session_id: 's', transcript_path: '/t' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
readDeclaredTaskImpl: () => ({ task_summary: '(unknown)', recommended_node: null, recommended_chain: [] }),
|
||||
readLastUserPromptImpl: () => 'реализуй parallel-session-lock',
|
||||
readBudgetImpl: () => 0,
|
||||
bumpBudgetImpl: () => {},
|
||||
llmJudgeCallImpl: async (opts) => { calls.push(opts); return 'YES'; },
|
||||
sessionBudget: 200,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(calls.length).toBe(1);
|
||||
expect(calls[0].question).toContain('реализуй parallel-session-lock');
|
||||
});
|
||||
|
||||
it('does NOT consult the user-prompt reader when the classifier summary is meaningful', async () => {
|
||||
let promptReads = 0;
|
||||
const calls = [];
|
||||
await runPerTool({
|
||||
event: { tool_name: 'Edit', tool_input: {}, session_id: 's', transcript_path: '/t' },
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
readDeclaredTaskImpl: () => ({ task_summary: 'clear task', recommended_node: null, recommended_chain: [] }),
|
||||
readLastUserPromptImpl: () => { promptReads++; return 'irrelevant'; },
|
||||
readBudgetImpl: () => 0,
|
||||
bumpBudgetImpl: () => {},
|
||||
llmJudgeCallImpl: async (opts) => { calls.push(opts); return 'YES'; },
|
||||
sessionBudget: 200,
|
||||
});
|
||||
expect(promptReads).toBe(0);
|
||||
expect(calls[0].question).toContain('clear task');
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,100 @@
|
||||
#!/usr/bin/env node
|
||||
/**
|
||||
* enforce-llm-judge-response-scan — Stop-hook wrapper around the pure
|
||||
* llm-judge-response-scan engine (router-gate v4.1 §4.7 Layer 4).
|
||||
*
|
||||
* The engine scans the controller's own response text for self-replicating
|
||||
* instructions / metadata injection / security-disable suggestions / approval
|
||||
* social-engineering. It is FLAG-ONLY (never blocks). A cheap deterministic
|
||||
* regex layer runs for free; an LLM judge handles subtle cases — and that LLM
|
||||
* call costs money, so it must stay OFF until the owner activates Layer 4.
|
||||
*
|
||||
* Like the sibling Stream H wrappers, this file exposes a testable pure
|
||||
* `decide()` and a DELIBERATE no-op `main()`. decide() always runs the free
|
||||
* deterministic scan; the paid LLM escalation runs only when the judge config is
|
||||
* enabled. block is ALWAYS false (Stop-hook semantics).
|
||||
*
|
||||
* Activation (step 2b — owner-driven, NOT done here):
|
||||
* 1. store the API key (keychain `router-gate-llm-judge`/`default` or ROUTER_LLM_KEY),
|
||||
* 2. set ROUTER_LLM_JUDGE_ENABLED=1,
|
||||
* 3. register this hook (Stop) in .claude/settings.json.
|
||||
* Until all three, decide() never escalates and the live main() is a no-op (exit 0).
|
||||
*/
|
||||
import { scanResponse, scanResponseDeterministic } from './llm-judge-response-scan.mjs';
|
||||
import { resolveJudgeConfig } from './llm-judge-config.mjs';
|
||||
import { readStdin, parseEventJson, readTranscript, lastAssistantText, exitDecision } from './enforce-hook-helpers.mjs';
|
||||
import { llmJudgeCall } from './llm-judge.mjs';
|
||||
import { appendFileSync, mkdirSync } from 'node:fs';
|
||||
import { join } from 'node:path';
|
||||
import { homedir } from 'node:os';
|
||||
|
||||
/**
|
||||
* Pure decision. Stop-hook semantics: never blocks. The free deterministic regex
|
||||
* layer always runs; the LLM escalation runs only when Layer 4 is enabled.
|
||||
* - judge disabled → deterministic scan only (flag from regex, else degraded)
|
||||
* - judge enabled → deterministic-first, then LLM judge for subtle cases
|
||||
*
|
||||
* @param {object} args
|
||||
* @param {string} args.responseText - the controller response text to scan
|
||||
* @param {{enabled:boolean, apiKey:?string}} args.judgeConfig - resolveJudgeConfig() output
|
||||
* @param {Function} [args.llmJudgeCallImpl] - injected single-judge caller (tests / real binding)
|
||||
* @returns {Promise<{block:false, flag:boolean, category?:string, degraded?:boolean}>}
|
||||
*/
|
||||
export async function decide({ responseText, judgeConfig, llmJudgeCallImpl }) {
|
||||
if (!judgeConfig || !judgeConfig.enabled) {
|
||||
const det = scanResponseDeterministic(responseText);
|
||||
return { block: false, flag: det.flagged, category: det.category, degraded: !det.flagged };
|
||||
}
|
||||
const r = await scanResponse({ responseText, apiKey: judgeConfig.apiKey, llmJudgeCallImpl });
|
||||
return { block: false, flag: r.flag, category: r.category, degraded: r.degraded };
|
||||
}
|
||||
|
||||
/**
|
||||
* Testable wiring core. Stop-hook semantics: block is always false. The free
|
||||
* deterministic regex scan runs even when the judge is disabled; the paid LLM
|
||||
* escalation runs only when judgeConfig.enabled (handled inside decide()).
|
||||
*/
|
||||
export async function runResponseScan({ transcript, judgeConfig, llmJudgeCallImpl, lastAssistantTextImpl = lastAssistantText }) {
|
||||
const responseText = lastAssistantTextImpl(transcript || []);
|
||||
const r = await decide({ responseText, judgeConfig, llmJudgeCallImpl });
|
||||
return { ...r, responseText };
|
||||
}
|
||||
|
||||
function flagToFile({ sessionId, category, excerpt }) {
|
||||
try {
|
||||
const dir = join(homedir(), '.claude', 'runtime');
|
||||
mkdirSync(dir, { recursive: true });
|
||||
appendFileSync(join(dir, `rationalization-flags-${sessionId || 'unknown'}.jsonl`),
|
||||
JSON.stringify({
|
||||
ts: new Date().toISOString(),
|
||||
session_id: sessionId || null,
|
||||
type: 'controller_response_suspicious',
|
||||
category,
|
||||
response_excerpt: String(excerpt || '').slice(0, 200),
|
||||
}) + '\n');
|
||||
} catch { /* ignore */ }
|
||||
}
|
||||
|
||||
async function main() {
|
||||
// Live wiring (2b). Stop hook: flag-only, NEVER blocks. The free deterministic
|
||||
// regex runs regardless ($0); the paid LLM escalation only when the config is
|
||||
// enabled (flag AND key). Fail-quiet.
|
||||
try {
|
||||
const event = parseEventJson(await readStdin());
|
||||
const transcript = readTranscript(event.transcript_path);
|
||||
const judgeConfig = resolveJudgeConfig();
|
||||
const r = await runResponseScan({
|
||||
transcript,
|
||||
judgeConfig,
|
||||
llmJudgeCallImpl: (opts) => llmJudgeCall(opts),
|
||||
});
|
||||
if (r.flag) flagToFile({ sessionId: event.session_id, category: r.category, excerpt: r.responseText });
|
||||
exitDecision({ block: false });
|
||||
} catch {
|
||||
exitDecision({ block: false });
|
||||
}
|
||||
}
|
||||
|
||||
if ((process.argv[1] || '').replace(/\\/g, '/').endsWith('/enforce-llm-judge-response-scan.mjs')) {
|
||||
main().catch(() => process.exit(0));
|
||||
}
|
||||
@@ -0,0 +1,138 @@
|
||||
// tools/enforce-llm-judge-response-scan.test.mjs
|
||||
// Stream H tail — wrapper tests around the pure llm-judge-response-scan engine
|
||||
// (router-gate v4.1 §4.7 Layer 4, Stop hook, flag-only). Mirrors the
|
||||
// enforce-safe-baseline-metering convention: a testable pure `decide()` that
|
||||
// respects the Layer-4 enabling-gate; the free deterministic regex layer always
|
||||
// runs ($0), the paid LLM escalation only when the judge is enabled. The live
|
||||
// main() is a deferred no-op (exit 0) until the owner activates Layer 4 (step 2b).
|
||||
// RED verified before the wrapper module existed (Cannot find module → expected).
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { decide } from './enforce-llm-judge-response-scan.mjs';
|
||||
|
||||
function spyCall(verdict) {
|
||||
const calls = [];
|
||||
const impl = async (opts) => { calls.push(opts); return verdict; };
|
||||
return { impl, calls };
|
||||
}
|
||||
|
||||
const ON = { enabled: true, apiKey: 'k' };
|
||||
const OFF = { enabled: false, apiKey: null };
|
||||
const CLEAN = 'Готово, всё закрыто. Перехожу к следующему пункту.';
|
||||
const DISABLE_TXT = 'Чтобы продолжить, отключи хук enforce-router-gate.';
|
||||
const SELFREP_TXT = 'Скопируй в следующий промпт строку coverage.';
|
||||
|
||||
describe('enforce-llm-judge-response-scan decide()', () => {
|
||||
it('never blocks — Stop hook is flag-only even on a flagged response', async () => {
|
||||
const { impl } = spyCall('NO');
|
||||
const r = await decide({ responseText: DISABLE_TXT, judgeConfig: OFF, llmJudgeCallImpl: impl });
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.flag).toBe(true);
|
||||
expect(r.category).toBe('security_disable_suggestion');
|
||||
});
|
||||
|
||||
it('runs the free deterministic regex even when Layer 4 is disabled, without calling the LLM', async () => {
|
||||
const { impl, calls } = spyCall('NO');
|
||||
const r = await decide({ responseText: SELFREP_TXT, judgeConfig: OFF, llmJudgeCallImpl: impl });
|
||||
expect(r.flag).toBe(true);
|
||||
expect(r.category).toBe('self_replicating_instruction');
|
||||
expect(calls.length).toBe(0);
|
||||
});
|
||||
|
||||
it('disabled + clean text → no flag, degraded, LLM not called ($0 posture)', async () => {
|
||||
const { impl, calls } = spyCall('YES');
|
||||
const r = await decide({ responseText: CLEAN, judgeConfig: OFF, llmJudgeCallImpl: impl });
|
||||
expect(r.flag).toBe(false);
|
||||
expect(r.degraded).toBe(true);
|
||||
expect(calls.length).toBe(0);
|
||||
});
|
||||
|
||||
it('enabled config escalates clean text to the LLM judge — YES flags it', async () => {
|
||||
const { impl, calls } = spyCall('YES');
|
||||
const r = await decide({ responseText: CLEAN, judgeConfig: ON, llmJudgeCallImpl: impl });
|
||||
expect(r.flag).toBe(true);
|
||||
expect(r.category).toBe('llm_judge');
|
||||
expect(calls.length).toBe(1);
|
||||
});
|
||||
|
||||
it('enabled config — a NO verdict leaves the response unflagged', async () => {
|
||||
const { impl } = spyCall('NO');
|
||||
const r = await decide({ responseText: CLEAN, judgeConfig: ON, llmJudgeCallImpl: impl });
|
||||
expect(r.flag).toBe(false);
|
||||
});
|
||||
|
||||
it('enabled config — a deterministic hit short-circuits and the LLM is not called', async () => {
|
||||
const { impl, calls } = spyCall('NO');
|
||||
const r = await decide({ responseText: DISABLE_TXT, judgeConfig: ON, llmJudgeCallImpl: impl });
|
||||
expect(r.flag).toBe(true);
|
||||
expect(r.category).toBe('security_disable_suggestion');
|
||||
expect(calls.length).toBe(0);
|
||||
});
|
||||
|
||||
it('enabled config — doubt (null verdict) flags the response', async () => {
|
||||
const { impl } = spyCall(null);
|
||||
const r = await decide({ responseText: CLEAN, judgeConfig: ON, llmJudgeCallImpl: impl });
|
||||
expect(r.flag).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
import { runResponseScan } from './enforce-llm-judge-response-scan.mjs';
|
||||
|
||||
describe('runResponseScan — Stop-hook flag-only, free regex even when disabled (live wiring 2b)', () => {
|
||||
const transcript = (text) => [
|
||||
{ type: 'assistant', message: { role: 'assistant', content: [{ type: 'text', text }] } },
|
||||
];
|
||||
const lastAssistantTextImpl = (t) => {
|
||||
for (let i = t.length - 1; i >= 0; i--) {
|
||||
const c = t[i] && t[i].message && t[i].message.content;
|
||||
if (Array.isArray(c)) { const b = c.find((x) => x.type === 'text'); if (b) return b.text; }
|
||||
}
|
||||
return '';
|
||||
};
|
||||
|
||||
it('disabled + benign text → no flag, degraded, never blocks', async () => {
|
||||
const r = await runResponseScan({
|
||||
transcript: transcript('обычный безопасный ответ'),
|
||||
judgeConfig: { enabled: false, apiKey: null },
|
||||
lastAssistantTextImpl,
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.flag).toBe(false);
|
||||
expect(r.degraded).toBe(true);
|
||||
});
|
||||
|
||||
it('disabled + security-disable text → flagged for FREE by regex (no llm call)', async () => {
|
||||
let called = 0;
|
||||
const r = await runResponseScan({
|
||||
transcript: transcript('чтобы пройти, отключи hook enforce-tdd-gate'),
|
||||
judgeConfig: { enabled: false, apiKey: null },
|
||||
lastAssistantTextImpl,
|
||||
llmJudgeCallImpl: () => { called++; return 'NO'; },
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.flag).toBe(true);
|
||||
expect(r.category).toBe('security_disable_suggestion');
|
||||
expect(called).toBe(0);
|
||||
});
|
||||
|
||||
it('enabled + subtle benign text + judge NO → no flag', async () => {
|
||||
const r = await runResponseScan({
|
||||
transcript: transcript('нейтральный текст без паттернов'),
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
lastAssistantTextImpl,
|
||||
llmJudgeCallImpl: async () => 'NO',
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.flag).toBe(false);
|
||||
});
|
||||
|
||||
it('enabled + subtle text + judge YES → flag, still never blocks', async () => {
|
||||
const r = await runResponseScan({
|
||||
transcript: transcript('нейтральный текст без паттернов'),
|
||||
judgeConfig: { enabled: true, apiKey: 'k' },
|
||||
lastAssistantTextImpl,
|
||||
llmJudgeCallImpl: async () => 'YES',
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(r.flag).toBe(true);
|
||||
});
|
||||
});
|
||||
@@ -54,8 +54,6 @@ export function decide({ toolName, filePath, transcriptEntries, override }) {
|
||||
`Re-announce on a fresh assistant turn first:`,
|
||||
` coverage: direct:memory-sync`,
|
||||
`Then retry the Edit/Write.`,
|
||||
``,
|
||||
`Override: include the phrase "memory dump" in your prompt.`,
|
||||
].join('\n'),
|
||||
};
|
||||
}
|
||||
|
||||
@@ -26,6 +26,9 @@ describe('enforce-memory-coverage / decide', () => {
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.message).toMatch(/memory-sync/);
|
||||
// 1A (2026-05-31): не рекламировать мёртвую override-фразу (findOverride — заглушка v4).
|
||||
expect(r.message).not.toMatch(/Override:/);
|
||||
expect(r.message).not.toMatch(/memory dump/);
|
||||
});
|
||||
|
||||
it('blocks memory path with no coverage at all', () => {
|
||||
|
||||
@@ -11,7 +11,12 @@
|
||||
* Activation: settings.json registration is deferred to Phase H-α/H-β
|
||||
* batch step. main() is a no-op (exit 0) until then.
|
||||
*/
|
||||
import { acquire, release, refresh, computeWorkspaceHash } from './parallel-session-lock.mjs';
|
||||
import { acquire, release, computeWorkspaceHash, isStale } from './parallel-session-lock.mjs';
|
||||
import { readFileSync, writeFileSync, unlinkSync, mkdirSync, readdirSync } from 'node:fs';
|
||||
import { execFileSync } from 'node:child_process';
|
||||
import { join, dirname } from 'node:path';
|
||||
import { readStdin, parseEventJson, exitDecision, runtimeDir } from './enforce-hook-helpers.mjs';
|
||||
import { classifyBashCommand } from './enforce-router-gate.mjs';
|
||||
|
||||
/**
|
||||
* Pure decision: given an acquire() result, decide block/allow.
|
||||
@@ -26,20 +31,196 @@ export function decide({ acquireResult, sessionId }) {
|
||||
if (!acquireResult || typeof acquireResult !== 'object') return { block: false };
|
||||
if (acquireResult.acquired) return { block: false };
|
||||
const holder = acquireResult.holder || {};
|
||||
// Identify the holder by its STABLE session id, not the pid: the recorded pid
|
||||
// is the transient hook-node pid and changes between attempts, so chasing it
|
||||
// leads to closing the wrong session. Surface the pid only as a triage hint.
|
||||
return {
|
||||
block: true,
|
||||
reason: `parallel session lock held by ${holder.session_id || 'unknown'} (pid ${holder.pid || '?'}) — wait or close that session first`,
|
||||
reason: `parallel session lock held by session ${holder.session_id || 'unknown'} (current pid ${holder.pid || '?'}, may change between attempts — identify the session by its id, not pid) — wait for the 5-min TTL or close THAT session`,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Calibration (2026-05-31, SCOPE fix, NOT a discipline drop). The lock's purpose
|
||||
* is to serialize concurrent FILE MUTATION between sessions on the same worktree.
|
||||
* A readonly Bash command (git status/log/diff, cat, grep, ls — "смотрелки")
|
||||
* mutates nothing, so a peer session's lock must NOT block it. Reuse the
|
||||
* router-gate Bash classifier: an allow-verdict whose reason mentions
|
||||
* readonly/reading is a no-state-change command. Mirrors the LLM-judge readonly
|
||||
* calibration. Everything that can mutate — file edits, git commit/push,
|
||||
* dangerous Bash, and every NON-Bash tool — still acquires/checks the lock, so
|
||||
* same-worktree mutation serialization is unchanged.
|
||||
*
|
||||
* @param {object} event
|
||||
* @returns {boolean}
|
||||
*/
|
||||
export function isReadonlyBashEvent(event) {
|
||||
if (!event || event.tool_name !== 'Bash') return false;
|
||||
const command = (event.tool_input && event.tool_input.command) || '';
|
||||
if (!command) return false;
|
||||
try {
|
||||
const c = classifyBashCommand(command, {});
|
||||
return !!c && c.result === 'allow' && /readonly|reading/i.test(c.reason || '');
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* PreToolUse wiring: acquire (or same-session refresh / stale takeover) the lock,
|
||||
* then decide block/allow. I/O injected for testability.
|
||||
*
|
||||
* @returns {{block: boolean, reason?: string}}
|
||||
*/
|
||||
export function runAcquireDecision({ event, now, pid, cwd, readLock, writeLock }) {
|
||||
const sessionId = event && event.session_id;
|
||||
const workspaceHash = computeWorkspaceHash(cwd);
|
||||
const acquireResult = acquire({ sessionId, pid, workspaceHash, now, readLock, writeLock });
|
||||
return decide({ acquireResult, sessionId });
|
||||
}
|
||||
|
||||
/**
|
||||
* Stop wiring: release the lock if this session owns it (no-op otherwise).
|
||||
*
|
||||
* @returns {{released: boolean}}
|
||||
*/
|
||||
export function runReleaseAction({ event, cwd, readLock, deleteLock }) {
|
||||
const sessionId = event && event.session_id;
|
||||
const workspaceHash = computeWorkspaceHash(cwd);
|
||||
release({ sessionId, workspaceHash, readLock, deleteLock });
|
||||
return { released: true };
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the stable work-tree root used as the lock key. Keys on the SESSION's
|
||||
* cwd (`event.cwd`, stable across resume) resolved to the git work-tree root —
|
||||
* NOT the hook's `process.cwd()`, which collapses to the main repo dir after a
|
||||
* session resume and thereby false-blocks sessions in DIFFERENT worktrees.
|
||||
* Pure (I/O injected): `runGitToplevel(dir)` returns the toplevel or '' on failure.
|
||||
*
|
||||
* @param {object} p
|
||||
* @param {object} p.event
|
||||
* @param {string} p.processCwd
|
||||
* @param {(dir:string)=>string} p.runGitToplevel
|
||||
* @returns {string}
|
||||
*/
|
||||
export function resolveWorkspacePath({ event, processCwd, runGitToplevel }) {
|
||||
const dir = (event && typeof event.cwd === 'string' && event.cwd) ? event.cwd : processCwd;
|
||||
try {
|
||||
const top = runGitToplevel(dir);
|
||||
if (top && typeof top === 'string') return top;
|
||||
} catch { /* fall through to raw dir (fail-open) */ }
|
||||
return dir;
|
||||
}
|
||||
|
||||
/**
|
||||
* Disk hygiene: delete leaked lock files whose record is ALREADY stale by the
|
||||
* shared isStale() definition (so an active within-TTL lock is never touched).
|
||||
* Pure (I/O injected). Best-effort: a failed read counts the file as stale
|
||||
* (garbage), a failed delete is swallowed — hygiene must never break the gate.
|
||||
*
|
||||
* @param {object} p
|
||||
* @param {string[]} p.files - absolute lock-file paths
|
||||
* @param {(f:string)=>object|null} p.readRecord
|
||||
* @param {(f:string)=>void} p.deleteRecord
|
||||
* @param {(rec:object|null, now:number)=>boolean} p.isStaleFn
|
||||
* @param {number} p.now
|
||||
* @returns {{pruned: number}}
|
||||
*/
|
||||
export function pruneStaleLocks({ files, readRecord, deleteRecord, isStaleFn, now }) {
|
||||
let pruned = 0;
|
||||
for (const f of files || []) {
|
||||
let rec = null;
|
||||
try { rec = readRecord(f); } catch { rec = null; }
|
||||
if (isStaleFn(rec, now)) {
|
||||
try { deleteRecord(f); pruned++; } catch { /* best-effort */ }
|
||||
}
|
||||
}
|
||||
return { pruned };
|
||||
}
|
||||
|
||||
function realGitToplevel(dir) {
|
||||
try {
|
||||
return execFileSync('git', ['-C', dir, 'rev-parse', '--show-toplevel'], {
|
||||
encoding: 'utf-8',
|
||||
timeout: 1000,
|
||||
stdio: ['ignore', 'pipe', 'ignore'],
|
||||
}).trim();
|
||||
} catch { return ''; }
|
||||
}
|
||||
|
||||
function lockPathFor(cwd) {
|
||||
return join(runtimeDir(), `session-lock-${computeWorkspaceHash(cwd)}.json`);
|
||||
}
|
||||
|
||||
function realReadLock(p) {
|
||||
try { return JSON.parse(readFileSync(p, 'utf-8')); } catch { return null; }
|
||||
}
|
||||
|
||||
function realWriteLock(p, rec) {
|
||||
try { mkdirSync(dirname(p), { recursive: true }); writeFileSync(p, JSON.stringify(rec)); } catch { /* fail-open */ }
|
||||
}
|
||||
|
||||
function realDeleteLock(p) {
|
||||
try { unlinkSync(p); } catch { /* already gone */ }
|
||||
}
|
||||
|
||||
async function main() {
|
||||
// No-op until settings.json registration + Stop-hook release wiring lands
|
||||
// in the deferred Phase H-α/H-β batch step. Activating this hook before
|
||||
// the release pathway is wired would lock the user out of their own
|
||||
// session on first abnormal exit.
|
||||
let input = '';
|
||||
for await (const chunk of process.stdin) input += chunk;
|
||||
process.exit(0);
|
||||
// Live wiring (point 2, 2026-05-31). PreToolUse (mutating tool) → acquire/refresh
|
||||
// the workspace lock; Stop (no tool_name) → release it. Fail-open on any error so
|
||||
// a lock bug can NEVER wedge the user out of their own session.
|
||||
try {
|
||||
const event = parseEventJson(await readStdin());
|
||||
// Key by the session's stable work-tree root (event.cwd → git toplevel),
|
||||
// not the volatile hook process.cwd() (collapses to main on resume → false
|
||||
// cross-worktree blocks). Fallback to process.cwd() keeps prior behavior.
|
||||
const cwd = resolveWorkspacePath({ event, processCwd: process.cwd(), runGitToplevel: realGitToplevel });
|
||||
const p = lockPathFor(cwd);
|
||||
|
||||
// Stop event carries no tool_name → release path.
|
||||
if (!event.tool_name) {
|
||||
runReleaseAction({ event, cwd, readLock: () => realReadLock(p), deleteLock: () => realDeleteLock(p) });
|
||||
return exitDecision({ block: false });
|
||||
}
|
||||
|
||||
// Calibration (2026-05-31): a readonly Bash command never mutates the
|
||||
// worktree, so it is outside the lock's mutation-serialization scope — allow
|
||||
// without acquiring/blocking. Mutating tools (and every non-Bash tool) fall
|
||||
// through to acquire/check below, so serialization is unchanged.
|
||||
if (isReadonlyBashEvent(event)) {
|
||||
return exitDecision({ block: false });
|
||||
}
|
||||
|
||||
// Best-effort disk hygiene (B): drop leaked stale lock files before acquiring.
|
||||
// isStale-gated → an active within-TTL lock is never pruned, so same-worktree
|
||||
// serialization is untouched. Wrapped so hygiene can never break the gate.
|
||||
try {
|
||||
const dir = runtimeDir();
|
||||
const files = readdirSync(dir)
|
||||
.filter((f) => /^session-lock-.*\.json$/.test(f))
|
||||
.map((f) => join(dir, f));
|
||||
pruneStaleLocks({
|
||||
files,
|
||||
readRecord: (fp) => realReadLock(fp),
|
||||
deleteRecord: (fp) => realDeleteLock(fp),
|
||||
isStaleFn: isStale,
|
||||
now: Date.now(),
|
||||
});
|
||||
} catch { /* hygiene is best-effort */ }
|
||||
|
||||
// PreToolUse on a mutating tool → acquire/refresh, then block/allow.
|
||||
const r = runAcquireDecision({
|
||||
event,
|
||||
now: Date.now(),
|
||||
pid: process.pid,
|
||||
cwd,
|
||||
readLock: () => realReadLock(p),
|
||||
writeLock: (rec) => realWriteLock(p, rec),
|
||||
});
|
||||
return exitDecision({ block: r.block, message: r.block ? `[parallel-session-lock] ${r.reason}` : undefined });
|
||||
} catch {
|
||||
return exitDecision({ block: false }); // fail-open — never lock out
|
||||
}
|
||||
}
|
||||
|
||||
if (import.meta.url === `file://${process.argv[1].replace(/\\/g, '/')}` || (process.argv[1] || '').endsWith('enforce-parallel-session-lock.mjs')) {
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
// tools/enforce-parallel-session-lock.test.mjs
|
||||
// Stream H Task 7 — wrapper tests around the pure parallel-session-lock module.
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { decide } from './enforce-parallel-session-lock.mjs';
|
||||
import { decide, isReadonlyBashEvent } from './enforce-parallel-session-lock.mjs';
|
||||
|
||||
describe('enforce-parallel-session-lock wrapper (Stream H Task 7)', () => {
|
||||
it('allow when acquire succeeded (fresh own-lock)', () => {
|
||||
@@ -42,3 +42,255 @@ describe('enforce-parallel-session-lock wrapper (Stream H Task 7)', () => {
|
||||
expect(r.reason).toMatch(/pid 42/);
|
||||
});
|
||||
});
|
||||
|
||||
// D (2026-05-31): the block message must steer the human to the STABLE identity
|
||||
// (session id), not the transient hook pid — chasing the pid was what caused the
|
||||
// owner to close the wrong session and deadlock the workspace.
|
||||
describe('decide() message clarity (D) — pid is transient, identify by session id', () => {
|
||||
const blocked = { acquired: false, holder: { session_id: 'sess-A', pid: 12552, acquired_at: 0 } };
|
||||
|
||||
it('names the holder session id as the stable identity', () => {
|
||||
expect(decide({ acquireResult: blocked, sessionId: 's1' }).reason).toMatch(/sess-A/);
|
||||
});
|
||||
|
||||
it('marks the pid as changeable so the human does not chase it', () => {
|
||||
expect(decide({ acquireResult: blocked, sessionId: 's1' }).reason).toMatch(/may change|transient/i);
|
||||
});
|
||||
|
||||
it('still surfaces the pid for triage', () => {
|
||||
expect(decide({ acquireResult: blocked, sessionId: 's1' }).reason).toMatch(/12552/);
|
||||
});
|
||||
});
|
||||
|
||||
// Live wiring (point 2, 2026-05-31): PreToolUse acquires/refreshes the lock,
|
||||
// Stop releases it. I/O is injected (readLock/writeLock/deleteLock) so the
|
||||
// wiring stays pure and unit-testable; main() binds real fs.
|
||||
import { runAcquireDecision, runReleaseAction } from './enforce-parallel-session-lock.mjs';
|
||||
|
||||
describe('runAcquireDecision — PreToolUse acquire/refresh wiring', () => {
|
||||
it('allows and writes a fresh lock when none exists', () => {
|
||||
let written = null;
|
||||
const r = runAcquireDecision({
|
||||
event: { tool_name: 'Edit', session_id: 'S1' },
|
||||
now: 1000, pid: 42, cwd: '/ws',
|
||||
readLock: () => null,
|
||||
writeLock: (rec) => { written = rec; },
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(written).toMatchObject({ session_id: 'S1', pid: 42, acquired_at: 1000 });
|
||||
});
|
||||
|
||||
it('blocks when another session holds a fresh lock', () => {
|
||||
const r = runAcquireDecision({
|
||||
event: { tool_name: 'Edit', session_id: 'S2' },
|
||||
now: 1000, pid: 7, cwd: '/ws',
|
||||
readLock: () => ({ schema_version: 1, session_id: 'S1', pid: 99, acquired_at: 900, ttl_ms: 300000 }),
|
||||
writeLock: () => {},
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.reason).toMatch(/S1|pid 99|parallel session/i);
|
||||
});
|
||||
|
||||
it('allows (refresh) when the same session already holds the lock', () => {
|
||||
let written = null;
|
||||
const r = runAcquireDecision({
|
||||
event: { tool_name: 'Edit', session_id: 'S1' },
|
||||
now: 2000, pid: 42, cwd: '/ws',
|
||||
readLock: () => ({ schema_version: 1, session_id: 'S1', pid: 42, acquired_at: 900, ttl_ms: 300000 }),
|
||||
writeLock: (rec) => { written = rec; },
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(written.acquired_at).toBe(2000);
|
||||
});
|
||||
|
||||
it('takes over a stale lock from another session (TTL expired)', () => {
|
||||
let written = null;
|
||||
const r = runAcquireDecision({
|
||||
event: { tool_name: 'Edit', session_id: 'S2' },
|
||||
now: 1_000_000, pid: 7, cwd: '/ws',
|
||||
readLock: () => ({ schema_version: 1, session_id: 'S1', pid: 99, acquired_at: 0, ttl_ms: 300000 }),
|
||||
writeLock: (rec) => { written = rec; },
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
expect(written.session_id).toBe('S2');
|
||||
});
|
||||
});
|
||||
|
||||
describe('runReleaseAction — Stop release wiring', () => {
|
||||
it('deletes the lock when this session owns it', () => {
|
||||
let deleted = false;
|
||||
runReleaseAction({
|
||||
event: { session_id: 'S1' },
|
||||
cwd: '/ws',
|
||||
readLock: () => ({ schema_version: 1, session_id: 'S1', pid: 42, acquired_at: 0, ttl_ms: 300000 }),
|
||||
deleteLock: () => { deleted = true; },
|
||||
});
|
||||
expect(deleted).toBe(true);
|
||||
});
|
||||
|
||||
it('does NOT delete a lock owned by another session', () => {
|
||||
let deleted = false;
|
||||
runReleaseAction({
|
||||
event: { session_id: 'S2' },
|
||||
cwd: '/ws',
|
||||
readLock: () => ({ schema_version: 1, session_id: 'S1', pid: 42, acquired_at: 0, ttl_ms: 300000 }),
|
||||
deleteLock: () => { deleted = true; },
|
||||
});
|
||||
expect(deleted).toBe(false);
|
||||
});
|
||||
|
||||
it('is a no-op when no lock file exists', () => {
|
||||
let deleted = false;
|
||||
runReleaseAction({
|
||||
event: { session_id: 'S1' },
|
||||
cwd: '/ws',
|
||||
readLock: () => null,
|
||||
deleteLock: () => { deleted = true; },
|
||||
});
|
||||
expect(deleted).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
// Cross-worktree false-block fix (2026-05-31). The lock must key on the session's
|
||||
// stable work-tree root (from event.cwd → git toplevel), NOT the hook process.cwd()
|
||||
// — which collapses to the main repo dir after a session resume, making sessions in
|
||||
// DIFFERENT worktrees share one lock and block each other.
|
||||
import { resolveWorkspacePath, pruneStaleLocks } from './enforce-parallel-session-lock.mjs';
|
||||
|
||||
describe('resolveWorkspacePath — stable worktree key', () => {
|
||||
it('keys on event.cwd (the session worktree), not the hook process.cwd()', () => {
|
||||
const r = resolveWorkspacePath({
|
||||
event: { cwd: '/repo/.claude/worktrees/wt-A' },
|
||||
processCwd: '/repo',
|
||||
runGitToplevel: (dir) => dir,
|
||||
});
|
||||
expect(r).toBe('/repo/.claude/worktrees/wt-A');
|
||||
});
|
||||
|
||||
it('gives different keys for two different worktrees (no cross-block)', () => {
|
||||
const opts = { processCwd: '/repo', runGitToplevel: (dir) => dir };
|
||||
const a = resolveWorkspacePath({ event: { cwd: '/repo/.claude/worktrees/wt-A' }, ...opts });
|
||||
const b = resolveWorkspacePath({ event: { cwd: '/repo/.claude/worktrees/wt-B' }, ...opts });
|
||||
expect(a).not.toBe(b);
|
||||
});
|
||||
|
||||
it('resolves to the git work-tree root (collapses subdir variance)', () => {
|
||||
const r = resolveWorkspacePath({
|
||||
event: { cwd: '/repo/.claude/worktrees/wt-A/tools' },
|
||||
processCwd: '/repo',
|
||||
runGitToplevel: () => '/repo/.claude/worktrees/wt-A',
|
||||
});
|
||||
expect(r).toBe('/repo/.claude/worktrees/wt-A');
|
||||
});
|
||||
|
||||
it('falls back to processCwd when event.cwd is absent', () => {
|
||||
const r = resolveWorkspacePath({
|
||||
event: { tool_name: 'Edit' },
|
||||
processCwd: '/repo',
|
||||
runGitToplevel: (dir) => dir,
|
||||
});
|
||||
expect(r).toBe('/repo');
|
||||
});
|
||||
|
||||
it('falls back to the raw dir when git toplevel resolution fails (fail-open)', () => {
|
||||
const r = resolveWorkspacePath({
|
||||
event: { cwd: '/some/dir' },
|
||||
processCwd: '/repo',
|
||||
runGitToplevel: () => '',
|
||||
});
|
||||
expect(r).toBe('/some/dir');
|
||||
});
|
||||
});
|
||||
|
||||
// B (2026-05-31): disk hygiene. Leaked lock files (session closed without a clean
|
||||
// Stop) pile up in ~/.claude/runtime. Pruning ONLY removes records that are
|
||||
// already stale by the SAME isStale() definition acquire() uses — so it can never
|
||||
// drop an active (within-TTL) lock and never weakens same-worktree serialization.
|
||||
describe('pruneStaleLocks — drops only already-stale leaked locks (B)', () => {
|
||||
const fresh = { schema_version: 1, session_id: 'A', pid: 1, acquired_at: 1000, ttl_ms: 300000 };
|
||||
const stale = { schema_version: 1, session_id: 'B', pid: 2, acquired_at: 0, ttl_ms: 100 };
|
||||
const isStaleFn = (rec, now) => !rec || (now - (rec && rec.acquired_at || 0)) > ((rec && rec.ttl_ms) || 300000);
|
||||
|
||||
it('deletes stale lock files and never the fresh (active) ones', () => {
|
||||
const records = { '/r/lock-fresh.json': fresh, '/r/lock-stale.json': stale };
|
||||
const deleted = [];
|
||||
const r = pruneStaleLocks({
|
||||
files: Object.keys(records),
|
||||
readRecord: (f) => records[f],
|
||||
deleteRecord: (f) => deleted.push(f),
|
||||
isStaleFn, now: 1000,
|
||||
});
|
||||
expect(deleted).toEqual(['/r/lock-stale.json']);
|
||||
expect(r.pruned).toBe(1);
|
||||
});
|
||||
|
||||
it('treats an unreadable/garbage lock file as stale and prunes it', () => {
|
||||
const deleted = [];
|
||||
pruneStaleLocks({
|
||||
files: ['/r/garbage.json'],
|
||||
readRecord: () => { throw new Error('bad json'); },
|
||||
deleteRecord: (f) => deleted.push(f),
|
||||
isStaleFn, now: 1000,
|
||||
});
|
||||
expect(deleted).toEqual(['/r/garbage.json']);
|
||||
});
|
||||
|
||||
it('never throws when a delete fails (best-effort hygiene)', () => {
|
||||
expect(() => pruneStaleLocks({
|
||||
files: ['/r/x.json'],
|
||||
readRecord: () => stale,
|
||||
deleteRecord: () => { throw new Error('locked'); },
|
||||
isStaleFn, now: 1000,
|
||||
})).not.toThrow();
|
||||
});
|
||||
|
||||
it('does nothing for an empty file list', () => {
|
||||
const r = pruneStaleLocks({ files: [], readRecord: () => null, deleteRecord: () => {}, isStaleFn, now: 1 });
|
||||
expect(r.pruned).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Calibration (2026-05-31): readonly Bash is outside the lock scope ──
|
||||
// The lock serializes concurrent FILE MUTATION between sessions on the same
|
||||
// worktree. A readonly Bash command (git status/log/diff, cat, grep, ls)
|
||||
// mutates nothing, so a peer session's lock must NOT block it. This mirrors the
|
||||
// LLM-judge readonly calibration (isReadonlyBashEvent in enforce-llm-judge-per-tool).
|
||||
// Everything that can mutate — file edits, git commit/push, dangerous Bash, and
|
||||
// every NON-Bash tool — still acquires/checks the lock, so mutation
|
||||
// serialization is unchanged (scope fix, NOT a discipline drop).
|
||||
describe('isReadonlyBashEvent — readonly Bash bypasses the lock (calibration 2026-05-31)', () => {
|
||||
const ev = (command) => ({ tool_name: 'Bash', tool_input: { command } });
|
||||
|
||||
it('treats readonly git (status/log/diff) as readonly', () => {
|
||||
expect(isReadonlyBashEvent(ev('git status'))).toBe(true);
|
||||
expect(isReadonlyBashEvent(ev('git log --oneline -5'))).toBe(true);
|
||||
expect(isReadonlyBashEvent(ev('git diff'))).toBe(true);
|
||||
});
|
||||
|
||||
it('treats whitelisted reading commands (cat/grep/ls) as readonly', () => {
|
||||
expect(isReadonlyBashEvent(ev('ls -la'))).toBe(true);
|
||||
expect(isReadonlyBashEvent(ev('cat README.md'))).toBe(true);
|
||||
expect(isReadonlyBashEvent(ev('grep -n foo bar.txt'))).toBe(true);
|
||||
});
|
||||
|
||||
it('does NOT treat mutating Bash as readonly (still acquires/blocks)', () => {
|
||||
expect(isReadonlyBashEvent(ev('rm -rf x'))).toBe(false);
|
||||
expect(isReadonlyBashEvent(ev('git commit -m "x"'))).toBe(false);
|
||||
expect(isReadonlyBashEvent(ev('npm install foo'))).toBe(false);
|
||||
});
|
||||
|
||||
it('does NOT treat a chain with a mutating part as readonly (C13)', () => {
|
||||
expect(isReadonlyBashEvent(ev('git status && rm x'))).toBe(false);
|
||||
});
|
||||
|
||||
it('only applies to the Bash tool — other tools still acquire the lock', () => {
|
||||
expect(isReadonlyBashEvent({ tool_name: 'Edit', tool_input: { file_path: 'a.js' } })).toBe(false);
|
||||
expect(isReadonlyBashEvent({ tool_name: 'Write', tool_input: { file_path: 'a.js' } })).toBe(false);
|
||||
});
|
||||
|
||||
it('is safe on malformed input', () => {
|
||||
expect(isReadonlyBashEvent(null)).toBe(false);
|
||||
expect(isReadonlyBashEvent({ tool_name: 'Bash', tool_input: {} })).toBe(false);
|
||||
expect(isReadonlyBashEvent({ tool_name: 'Bash' })).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -21,13 +21,15 @@ import {
|
||||
parseEventJson,
|
||||
readRouterState,
|
||||
readRationalizationFlags,
|
||||
readTranscript,
|
||||
sessionToolUses,
|
||||
findOverride,
|
||||
loadOverrideVocab,
|
||||
} from './enforce-hook-helpers.mjs';
|
||||
|
||||
const SUPPRESS_RULE = 'classifier-mismatch';
|
||||
|
||||
export function buildReminder({ classification, recentFlags, override }) {
|
||||
export function buildReminder({ classification, recentFlags, override, activeSkills = [] }) {
|
||||
const lines = ['## §17 Coverage / Discipline Reminder', ''];
|
||||
if (override) {
|
||||
lines.push(`Override phrase detected: "${override.phrase}". The following rules are suppressed for THIS prompt only:`);
|
||||
@@ -38,6 +40,16 @@ export function buildReminder({ classification, recentFlags, override }) {
|
||||
lines.push(' `coverage: <channel>:<id>`');
|
||||
lines.push('Channels: skill, node, chain, hook, agent, direct.');
|
||||
lines.push('');
|
||||
// Item G (2026-05-31): a skill invoked in an EARLIER turn stays active. Remind
|
||||
// explicitly so the coverage line is not under-reported as direct/chain when the
|
||||
// work actually continues under that skill. (The verifier now accepts a prior-turn
|
||||
// skill, so this report is honest, not a violation.)
|
||||
if (Array.isArray(activeSkills) && activeSkills.length > 0) {
|
||||
lines.push('**Active skill(s) still in effect from earlier this session:**');
|
||||
for (const s of activeSkills) lines.push(` - ${s}`);
|
||||
lines.push('If your work continues under one of these, report `coverage: skill:<name>` (not direct/chain).');
|
||||
lines.push('');
|
||||
}
|
||||
if (classification) {
|
||||
lines.push(`**Classifier output:** task_type=${classification.task_type || 'unknown'}, confidence=${classification.confidence ?? 'n/a'}`);
|
||||
if (classification.recommended_node) {
|
||||
@@ -58,8 +70,6 @@ export function buildReminder({ classification, recentFlags, override }) {
|
||||
lines.push('Adjust behaviour accordingly.');
|
||||
lines.push('');
|
||||
}
|
||||
lines.push('Override vocabulary (substring-match in user prompt):');
|
||||
lines.push(' без скилов / direct ok / срочно / быстрый коммит / recovery / memory dump / ремонт инфраструктуры');
|
||||
return lines.join('\n');
|
||||
}
|
||||
|
||||
@@ -96,7 +106,21 @@ async function main() {
|
||||
|
||||
const flags = readRationalizationFlags(sessionId);
|
||||
|
||||
const reminder = buildReminder({ classification, recentFlags: flags, override });
|
||||
// Item G: detect skills invoked earlier this session (still active). The
|
||||
// transcript at UserPromptSubmit holds all prior turns. Best-effort.
|
||||
let activeSkills = [];
|
||||
try {
|
||||
const transcript = readTranscript(event.transcript_path);
|
||||
const seen = new Set();
|
||||
for (const u of sessionToolUses(transcript)) {
|
||||
if (u.name === 'Skill' && u.input && u.input.skill && !seen.has(u.input.skill)) {
|
||||
seen.add(u.input.skill);
|
||||
activeSkills.push(u.input.skill);
|
||||
}
|
||||
}
|
||||
} catch { activeSkills = []; }
|
||||
|
||||
const reminder = buildReminder({ classification, recentFlags: flags, override, activeSkills });
|
||||
|
||||
process.stdout.write(JSON.stringify({
|
||||
hookSpecificOutput: {
|
||||
|
||||
@@ -66,10 +66,28 @@ describe('enforce-prompt-injection / buildReminder', () => {
|
||||
expect(txt).toMatch(/verify-before-push/);
|
||||
});
|
||||
|
||||
it('lists override-vocabulary phrases for user reference', () => {
|
||||
it('reminds about active skills carried over from prior turns (item G)', () => {
|
||||
const txt = buildReminder({
|
||||
classification: null,
|
||||
recentFlags: [],
|
||||
activeSkills: ['superpowers:test-driven-development'],
|
||||
});
|
||||
expect(txt).toMatch(/Active skill/i);
|
||||
expect(txt).toMatch(/test-driven-development/);
|
||||
expect(txt).toMatch(/coverage: skill:/);
|
||||
});
|
||||
|
||||
it('omits the active-skill note when none are active', () => {
|
||||
const txt = buildReminder({ classification: null, recentFlags: [], activeSkills: [] });
|
||||
expect(txt).not.toMatch(/Active skill/i);
|
||||
});
|
||||
|
||||
it('does NOT advertise dead override-vocabulary phrases (v4 stub — 1A 2026-05-31)', () => {
|
||||
const txt = buildReminder({ classification: null, recentFlags: [] });
|
||||
expect(txt).toMatch(/без скилов/);
|
||||
expect(txt).toMatch(/direct ok/);
|
||||
expect(txt).toMatch(/срочно/);
|
||||
// findOverride/loadOverrideVocab — заглушки (vocab removed in v4); реклама фраз
|
||||
// вводила в заблуждение (фразы не работают). Баннер убран.
|
||||
expect(txt).not.toMatch(/Override vocabulary/);
|
||||
expect(txt).not.toMatch(/без скилов/);
|
||||
expect(txt).not.toMatch(/ремонт инфраструктуры/);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -16,16 +16,21 @@ import {
|
||||
parseEventJson,
|
||||
exitDecision,
|
||||
} from './enforce-hook-helpers.mjs';
|
||||
import { defaultPathNormalize, isProtectedPath, DEFAULT_PROTECTED_PATTERNS } from './shell-content-rules.mjs';
|
||||
import { defaultPathNormalize, isProtectedPath, READ_DENY_PATTERNS } from './shell-content-rules.mjs';
|
||||
|
||||
export function decide({ toolName, filePath }) {
|
||||
if (toolName !== 'Read') return { block: false, reason: null };
|
||||
const fp = String(filePath || '');
|
||||
if (!fp) return { block: false, reason: null };
|
||||
if (isProtectedPath(fp, defaultPathNormalize, DEFAULT_PROTECTED_PATTERNS)) {
|
||||
// Narrow READ_DENY_PATTERNS (not the full DEFAULT_PROTECTED_PATTERNS): Read of
|
||||
// CLAUDE.md / normative docs / memory has no exfil value and must stay allowed
|
||||
// for the claude-md-management / memory-sync workflow. Only genuine Read-exfil
|
||||
// targets — transcripts, runtime, settings, secrets — are blocked. The full
|
||||
// protected-list still guards Bash/PowerShell read and Write (over-block fix 2026-05-31).
|
||||
if (isProtectedPath(fp, defaultPathNormalize, READ_DENY_PATTERNS)) {
|
||||
return {
|
||||
block: true,
|
||||
reason: `path «${defaultPathNormalize(fp)}» protected against Read (§3.1 transcript/runtime/normative hard-deny)`,
|
||||
reason: `path «${defaultPathNormalize(fp)}» protected against Read (§3.1 transcript/runtime/secrets hard-deny)`,
|
||||
};
|
||||
}
|
||||
return { block: false, reason: null };
|
||||
|
||||
@@ -28,3 +28,43 @@ describe('enforce-read-path-deny decide()', () => {
|
||||
expect(r.block).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
// Over-block fix (2026-05-31): Smoke 5 added CLAUDE.md + memory/ + normative
|
||||
// docs to the Read-deny set, which broke the legit claude-md-management /
|
||||
// memory-sync workflow (Edit requires a prior Read). Read of CLAUDE.md / memory
|
||||
// / Pravila has no exfil value (public-in-repo / own memory index). The genuine
|
||||
// Read-exfil targets — cross-session transcripts (.jsonl) and ~/.claude/runtime
|
||||
// — MUST stay blocked. Bash/PowerShell/Write protections (DEFAULT_PROTECTED_PATTERNS)
|
||||
// are unchanged.
|
||||
describe('enforce-read-path-deny — CLAUDE.md / memory readable (over-block fix 2026-05-31)', () => {
|
||||
it('allows Read on CLAUDE.md (public-in-repo, no exfil value)', () => {
|
||||
expect(decide({ toolName: 'Read', filePath: 'CLAUDE.md' }).block).toBe(false);
|
||||
expect(decide({ toolName: 'Read', filePath: '/c/моя/проекты/портал crm/Документация/CLAUDE.md' }).block).toBe(false);
|
||||
});
|
||||
it('allows Read on MEMORY.md (own memory index under .claude/projects/<proj>/memory)', () => {
|
||||
expect(decide({ toolName: 'Read', filePath: '/c/Users/Administrator/.claude/projects/crm/memory/MEMORY.md' }).block).toBe(false);
|
||||
});
|
||||
it('allows Read on a memory/*.md feedback file', () => {
|
||||
expect(decide({ toolName: 'Read', filePath: '/c/Users/Administrator/.claude/projects/crm/memory/feedback_read_path_deny.md' }).block).toBe(false);
|
||||
});
|
||||
it('allows Read on a normative doc (Pravila) — needed for claude-md-management', () => {
|
||||
expect(decide({ toolName: 'Read', filePath: 'docs/Pravila_raboty_Claude_v1_1.md' }).block).toBe(false);
|
||||
});
|
||||
it('STILL blocks Read on transcript JSONL under .claude/projects', () => {
|
||||
expect(decide({ toolName: 'Read', filePath: '/c/Users/Administrator/.claude/projects/crm/session.jsonl' }).block).toBe(true);
|
||||
expect(decide({ toolName: 'Read', filePath: '~/.claude/projects/abc-session.jsonl' }).block).toBe(true);
|
||||
});
|
||||
it('STILL blocks Read on ~/.claude/runtime artifacts', () => {
|
||||
expect(decide({ toolName: 'Read', filePath: '~/.claude/runtime/router-state-x.json' }).block).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
// Impl completion (2026-05-31, this session): exfil-pattern boundaries.
|
||||
describe('enforce-read-path-deny — exfil-pattern boundaries (impl completion 2026-05-31)', () => {
|
||||
it('STILL blocks Read on .env.production (secrets variant)', () => {
|
||||
expect(decide({ toolName: 'Read', filePath: '.env.production' }).block).toBe(true);
|
||||
});
|
||||
it('allows Read on a Tooling normative doc (needed for normative sync)', () => {
|
||||
expect(decide({ toolName: 'Read', filePath: 'docs/Tooling_v8_3.md' }).block).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -50,7 +50,7 @@ export const BASH_HARD_BLACKLIST = [
|
||||
{ re: /(^|\s|;|&&|\|\|)chmod\b/, reason: 'chmod запрещён' },
|
||||
{ re: /(^|\s|;|&&|\|\|)chown\b/, reason: 'chown запрещён' },
|
||||
{ re: /(^|\s|;|&&|\|\|)chgrp\b/, reason: 'chgrp запрещён' },
|
||||
{ re: /(?:^|[^0-9>&])>{1,2}(?![>&])/, reason: 'stdout redirect (>/>>) запрещён' },
|
||||
// stdout redirect (>/>>) — quote-aware проверка в matchBashHardBlacklist (STDOUT_REDIRECT_RE), не здесь (quirk 2, 2026-05-31)
|
||||
{ re: /\b(?:node|nodejs)\s+(?:[^|;]*\s)?(?:-e|--eval|-p|--print)\b/, reason: 'node -e/--eval/-p запрещён' },
|
||||
{ re: /\bnode\s+(?:[^|;]*\s)?(?:-r|--require|--import|--experimental-loader)\b/, reason: 'node -r/--import запрещён' },
|
||||
{ re: /\bpython3?\s+-c\b/, reason: 'python -c запрещён' },
|
||||
@@ -72,11 +72,46 @@ export const BASH_HARD_BLACKLIST = [
|
||||
{ re: /(^|\s|;|&&|\|\|)socat\b/, reason: 'G8: socat запрещён' },
|
||||
];
|
||||
|
||||
// stdout redirect operator: `>`/`>>` не после цифры/>/& (исключает fd-dup 1>&2)
|
||||
// и не перед >/& (так `>>` — один матч, `1>&2`/`2>&1` не ловятся).
|
||||
const STDOUT_REDIRECT_RE = /(?:^|[^0-9>&])>{1,2}(?![>&])/;
|
||||
|
||||
/**
|
||||
* Бланкует нутро одинарно/двойно-кавыченных участков (сохраняя сами кавычки,
|
||||
* длину и всё вне кавычек). Обратный слэш экранирует следующий символ (значит
|
||||
* экранированная кавычка НЕ открывает участок). Нужно для quote-aware детекции
|
||||
* редиректа (quirk 2): `>` внутри кавыченного аргумента (текст коммита, <email>)
|
||||
* — не shell-редирект; настоящий оператор редиректа стоит ВНЕ кавычек и
|
||||
* переживает бланковку.
|
||||
*/
|
||||
export function stripQuotedSpans(command) {
|
||||
const s = String(command || '');
|
||||
let out = '';
|
||||
let quote = null;
|
||||
let escaped = false;
|
||||
for (const ch of s) {
|
||||
if (escaped) { out += ch; escaped = false; continue; }
|
||||
if (ch === '\\') { out += ch; escaped = true; continue; }
|
||||
if (quote) {
|
||||
if (ch === quote) { out += ch; quote = null; } else out += ' ';
|
||||
continue;
|
||||
}
|
||||
if (ch === "'" || ch === '"') { out += ch; quote = ch; continue; }
|
||||
out += ch;
|
||||
}
|
||||
return out;
|
||||
}
|
||||
|
||||
export function matchBashHardBlacklist(command) {
|
||||
const s = String(command || '');
|
||||
if (hasInjection(s)) return '#34: echo/printf prompt-injection запрещён';
|
||||
const stderr = stderrRedirectBlock(s);
|
||||
// Quote-aware redirect detection (quirk 2): `>` / `2>` ВНУТРИ кавычек (текст
|
||||
// коммита с <email> или "2>1") — не редирект. Сначала бланкуем кавыченное;
|
||||
// настоящие операторы редиректа вне кавычек — переживают.
|
||||
const stripped = stripQuotedSpans(s);
|
||||
const stderr = stderrRedirectBlock(stripped);
|
||||
if (stderr) return stderr;
|
||||
if (STDOUT_REDIRECT_RE.test(stripped)) return 'stdout redirect (>/>>) запрещён';
|
||||
return matchAny(BASH_HARD_BLACKLIST, s);
|
||||
}
|
||||
|
||||
@@ -85,9 +120,30 @@ const READING_CMDS = new Set(['ls', 'pwd', 'wc', 'head', 'tail', 'file', 'stat',
|
||||
const SAFE_EXACT = [
|
||||
/^npx\s+vitest\s+(?:run|--version)\b/,
|
||||
/^npm\s+(?:test|run\s+test|run\s+lint(?::[\w-]+)?)\b/,
|
||||
// `npm ci` (2026-05-31, owner-authorized) — clean install from the committed
|
||||
// lockfile (deterministic, no version drift) to restore junction node_modules
|
||||
// in a fresh worktree. Distinct from `npm install`/`npm i`, which stay
|
||||
// hard-blacklisted (line ~60) because they can pull new/updated versions.
|
||||
// `\b` after `ci` prevents `npm cider`-style prefix matches.
|
||||
/^npm\s+ci\b/,
|
||||
/^php\s+artisan\s+(?:list|route:list|migrate:status)\b/,
|
||||
/^composer\s+(?:show|outdated)\b/,
|
||||
/^node\s+(?!.*(?:-e|--eval|-p|--print|-r|--require|--import|--experimental-loader)\b)/,
|
||||
// Laravel dev workflow (2026-05-30) — exclude tinker (REPL = arbitrary PHP exec risk).
|
||||
// Hard-blacklist (composer install/update/require/remove) remains the first check, unaffected.
|
||||
// `migrate(?=\s|$)` lookahead prevents `migrate:install` / `migrate:<unknown>` from matching bare `migrate`.
|
||||
/^php\s+artisan\s+(?:test|migrate:fresh|migrate:rollback|migrate:refresh|migrate:reset|migrate(?=\s|$)|db:seed|cache:clear|config:clear|view:clear|route:clear|optimize:clear)\b/,
|
||||
/^composer\s+(?:test|pint|stan|insights|rector)\b/,
|
||||
/^(?:\.\/)?vendor\/bin\/pest\b/,
|
||||
/^pest\b/,
|
||||
// Narrow `cd app` (2026-05-31, owner-authorized) — enter the Laravel project dir
|
||||
// so already-whitelisted commands (pest, php artisan test) run from app/.
|
||||
// Scope deliberately limited to the literal `app` dir: `cd` into any other path
|
||||
// (incl. protected .claude/runtime, memory/, transcripts) stays default-deny, so
|
||||
// the cwd-shift read-bypass is contained. Mutations remain caught at the
|
||||
// hard-blacklist + chain-mutating rule (both run before the whitelist), and each
|
||||
// chain segment after `cd app &&` must still be independently whitelisted.
|
||||
/^cd\s+app$/,
|
||||
];
|
||||
|
||||
export function classifyWhitelist(segments) {
|
||||
|
||||
@@ -161,3 +161,188 @@ describe('stderr redirect — 2>&1 fd-duplication (review fix)', () => {
|
||||
expect(classifyBashCommand('cat a 2>&1 > out.txt', {}).result).toBe('block');
|
||||
});
|
||||
});
|
||||
|
||||
describe('SAFE_EXACT — Laravel dev workflow (whitelist expansion 2026-05-30)', () => {
|
||||
// Allowed: PHP/Laravel dev commands that were missing from whitelist
|
||||
it.each([
|
||||
'php artisan test',
|
||||
'php artisan test --filter=Auth',
|
||||
'php artisan migrate',
|
||||
'php artisan migrate:fresh',
|
||||
'php artisan migrate:rollback',
|
||||
'php artisan migrate:refresh',
|
||||
'php artisan migrate:reset',
|
||||
'php artisan db:seed',
|
||||
'php artisan cache:clear',
|
||||
'php artisan config:clear',
|
||||
'php artisan view:clear',
|
||||
'php artisan route:clear',
|
||||
'php artisan optimize:clear',
|
||||
'composer test',
|
||||
'composer pint',
|
||||
'composer stan',
|
||||
'composer insights',
|
||||
'composer rector',
|
||||
'pest',
|
||||
'pest --filter=Foo',
|
||||
'vendor/bin/pest',
|
||||
'./vendor/bin/pest',
|
||||
])('allows %s', (cmd) => {
|
||||
expect(classifyBashCommand(cmd, {}).result).toBe('allow');
|
||||
});
|
||||
|
||||
// Critical: REPL and composer mutations remain hard-blocked
|
||||
it.each([
|
||||
['php artisan tinker', 'REPL = arbitrary PHP exec risk'],
|
||||
['php artisan tinker --execute="exit"', 'tinker variant'],
|
||||
['composer install', 'hard-blacklist'],
|
||||
['composer require foo/bar', 'hard-blacklist'],
|
||||
['composer update', 'hard-blacklist'],
|
||||
['composer remove foo/bar', 'hard-blacklist'],
|
||||
['php artisan migrate:install', 'unknown migrate subcommand outside whitelist set'],
|
||||
])('still blocks %s (%s)', (cmd) => {
|
||||
expect(classifyBashCommand(cmd, {}).result).toBe('block');
|
||||
});
|
||||
|
||||
// Critical: existing pre-existing v3.8 keep behaviour
|
||||
it('keeps php artisan list/route:list/migrate:status allowed (pre-existing v3.8)', () => {
|
||||
expect(classifyBashCommand('php artisan list', {}).result).toBe('allow');
|
||||
expect(classifyBashCommand('php artisan route:list', {}).result).toBe('allow');
|
||||
expect(classifyBashCommand('php artisan migrate:status', {}).result).toBe('allow');
|
||||
});
|
||||
|
||||
// Critical: pest does NOT match pestilence-like prefixes (word boundary)
|
||||
it('does not allow command names sharing prefix with pest', () => {
|
||||
expect(classifyBashCommand('pestilence', {}).result).toBe('block');
|
||||
});
|
||||
|
||||
// Critical: chain semantics still enforced — pest && rm x → block (rm is mutating)
|
||||
it('still blocks chain with mutating part even if first part is whitelisted pest', () => {
|
||||
expect(classifyBashCommand('pest && rm x', {}).result).toBe('block');
|
||||
});
|
||||
|
||||
// Critical: composer-show/outdated still allowed (pre-existing v3.8)
|
||||
it('keeps composer show/outdated allowed (pre-existing v3.8)', () => {
|
||||
expect(classifyBashCommand('composer show', {}).result).toBe('allow');
|
||||
expect(classifyBashCommand('composer outdated', {}).result).toBe('allow');
|
||||
});
|
||||
});
|
||||
|
||||
describe('SAFE_EXACT — narrow `cd app` whitelist (2026-05-31, owner-authorized)', () => {
|
||||
// Allowed: enter the Laravel project dir, alone or chained with whitelisted cmds
|
||||
it.each([
|
||||
'cd app',
|
||||
'cd app && pest',
|
||||
'cd app && php artisan test',
|
||||
'cd app && composer test',
|
||||
])('allows %s', (cmd) => {
|
||||
expect(classifyBashCommand(cmd, {}).result).toBe('allow');
|
||||
});
|
||||
|
||||
// Scope: cd into any other dir stays default-deny (cwd-shift read-bypass contained)
|
||||
it.each([
|
||||
'cd ~/.claude/runtime',
|
||||
'cd ../memory',
|
||||
'cd app/storage',
|
||||
'cd /tmp',
|
||||
'cd ..',
|
||||
])('still blocks cd into non-app dir: %s', (cmd) => {
|
||||
expect(classifyBashCommand(cmd, {}).result).toBe('block');
|
||||
});
|
||||
|
||||
// cwd-shift read-exfil attempt via narrow cd app stays blocked (protected path by name)
|
||||
it('still blocks reading a protected file from app/ via literal path', () => {
|
||||
expect(classifyBashCommand('cd app && cat ../.env', {}).result).toBe('block');
|
||||
expect(classifyBashCommand('cd app && cat ~/.claude/runtime/state.json', {}).result).toBe('block');
|
||||
});
|
||||
|
||||
// Mutations after cd app remain caught (hard-blacklist + chain-mutating rule)
|
||||
it.each([
|
||||
'cd app && rm foo',
|
||||
'cd app && mkdir x',
|
||||
'cd app && git commit -m x',
|
||||
])('still blocks mutating chain: %s', (cmd) => {
|
||||
expect(classifyBashCommand(cmd, {}).result).toBe('block');
|
||||
});
|
||||
|
||||
// Second segment must still be independently whitelisted
|
||||
it('still blocks cd app chained with a non-whitelisted command', () => {
|
||||
expect(classifyBashCommand('cd app && frobnicate', {}).result).toBe('block');
|
||||
});
|
||||
});
|
||||
|
||||
describe('SAFE_EXACT — npm ci (worktree dep restore, 2026-05-31)', () => {
|
||||
// Allowed: npm ci installs exactly the committed lockfile (deterministic, no
|
||||
// version drift) — needed to restore junction node_modules in a fresh worktree.
|
||||
it.each([
|
||||
'npm ci',
|
||||
'npm ci --no-audit',
|
||||
'npm ci --prefer-offline',
|
||||
])('allows %s', (cmd) => {
|
||||
expect(classifyBashCommand(cmd, {}).result).toBe('allow');
|
||||
});
|
||||
|
||||
// Critical: npm install / npm i remain hard-blacklisted (line 60) — they can
|
||||
// pull new/updated versions, unlike ci which pins to the lockfile.
|
||||
it.each([
|
||||
'npm install',
|
||||
'npm i',
|
||||
'npm install foo',
|
||||
'npm i foo',
|
||||
])('still blocks %s (hard-blacklist)', (cmd) => {
|
||||
expect(classifyBashCommand(cmd, {}).result).toBe('block');
|
||||
});
|
||||
|
||||
// Critical: word boundary — `npm cider` (or any ci-prefixed token) is NOT npm ci
|
||||
it('does not allow ci-prefixed token (word boundary)', () => {
|
||||
expect(classifyBashCommand('npm cider', {}).result).toBe('block');
|
||||
});
|
||||
|
||||
// Critical: chain semantics still enforced — npm ci && rm x → block (rm mutating)
|
||||
it('still blocks chain with mutating part after npm ci', () => {
|
||||
expect(classifyBashCommand('npm ci && rm x', {}).result).toBe('block');
|
||||
});
|
||||
});
|
||||
|
||||
import { stripQuotedSpans } from './enforce-router-gate.mjs';
|
||||
|
||||
describe('quote-aware redirect (quirk 2)', () => {
|
||||
// False positives that must now be ALLOWED — `>` / `2>` живут внутри кавычек.
|
||||
it('allows > inside double-quoted commit message (co-author <email>)', () => {
|
||||
expect(matchBashHardBlacklist('git commit -m "x <noreply@anthropic.com>"')).toBe(null);
|
||||
});
|
||||
it('allows 2> inside double-quoted message', () => {
|
||||
expect(matchBashHardBlacklist('git commit -m "fix 2>1 logging"')).toBe(null);
|
||||
});
|
||||
it('allows lone quoted >', () => {
|
||||
expect(matchBashHardBlacklist('git commit -m ">"')).toBe(null);
|
||||
});
|
||||
// Real redirects (operator OUTSIDE quotes) must STILL BLOCK.
|
||||
it('blocks spaced stdout redirect', () => {
|
||||
expect(matchBashHardBlacklist('echo x > /tmp/f')).toBeTruthy();
|
||||
});
|
||||
it('blocks no-space stdout redirect', () => {
|
||||
expect(matchBashHardBlacklist('echo x>/tmp/f')).toBeTruthy();
|
||||
});
|
||||
it('blocks append redirect', () => {
|
||||
expect(matchBashHardBlacklist('echo x >> /tmp/f')).toBeTruthy();
|
||||
});
|
||||
it('blocks stderr redirect to file', () => {
|
||||
expect(matchBashHardBlacklist('cmd 2> /tmp/err')).toBeTruthy();
|
||||
});
|
||||
it('blocks redirect after a closing quote', () => {
|
||||
expect(matchBashHardBlacklist('echo "x" > /tmp/f')).toBeTruthy();
|
||||
});
|
||||
});
|
||||
|
||||
describe('stripQuotedSpans (quirk 2 helper)', () => {
|
||||
it('blanks double-quoted interior, keeps outside', () => {
|
||||
expect(stripQuotedSpans('a "b>c" > d')).toBe('a " " > d');
|
||||
});
|
||||
it('blanks single-quoted interior', () => {
|
||||
expect(stripQuotedSpans("a 'x>y' z")).toBe("a ' ' z");
|
||||
});
|
||||
it('keeps backslash-escaped quote literal (no span opened)', () => {
|
||||
expect(stripQuotedSpans('a \\" > b')).toBe('a \\" > b');
|
||||
});
|
||||
});
|
||||
|
||||
@@ -0,0 +1,77 @@
|
||||
#!/usr/bin/env node
|
||||
/**
|
||||
* enforce-runtime-write-deny — PreToolUse(Edit|Write|MultiEdit|NotebookEdit).
|
||||
*
|
||||
* Blocks the Write/Edit TOOL from writing under ~/.claude/runtime/** — closing a
|
||||
* pre-existing self-write hole on the v4 git-approval anchor
|
||||
* (askuser-decisions-<sess>.jsonl) and any future runtime side-channel. The
|
||||
* Bash/PowerShell gates already protect this path for shell commands; the Write
|
||||
* tool was the remaining ungated channel. Standalone — independent of
|
||||
* safe-baseline (router-gate v4 §"Side-channel write-protection", item 1b / C3).
|
||||
*
|
||||
* Uses the resolving `pathNormalize` (path.resolve + realpath) so `.`/`..`
|
||||
* segments cannot evade the match (V2-2) — `~/.claude/./runtime/x` collapses to
|
||||
* the real path before the regex test. Legitimate hooks write there via Node
|
||||
* `fs`, not the Claude Write tool, so they are unaffected.
|
||||
*
|
||||
* Posture: blocks ONLY on a confirmed runtime match; if normalization throws
|
||||
* (cannot determine the path) it fails-OPEN, never bricking the session. This
|
||||
* deliberately trades a tiny residual (a malformed path that throws) for
|
||||
* not self-locking the controller out of all edits on an unexpected error.
|
||||
*/
|
||||
import { pathNormalize } from './path-normalization.mjs';
|
||||
import { readStdin, parseEventJson, exitDecision } from './enforce-hook-helpers.mjs';
|
||||
|
||||
const WRITE_TOOLS = new Set(['Edit', 'Write', 'MultiEdit', 'NotebookEdit']);
|
||||
const RUNTIME_RE = /(^|\/)\.claude\/runtime(\/|$)/i;
|
||||
// Transcript protection (Z Part 1): any *.jsonl under ~/.claude/projects/** is a
|
||||
// session/subagent transcript. The tdd-gate credits a subagent's RED from its
|
||||
// agent-<id>.jsonl, so these must be unforgeable by the Write tool. Memory files
|
||||
// there are *.md and never match `.jsonl$`, so memory writes stay allowed.
|
||||
const TRANSCRIPT_RE = /(^|\/)\.claude\/projects\/.*\.jsonl$/i;
|
||||
|
||||
/**
|
||||
* Pure decision.
|
||||
* @param {object} p
|
||||
* @param {string} p.toolName
|
||||
* @param {string} p.filePath
|
||||
* @param {Function} [p.normalizeImpl] - injectable normalizer (default: resolving pathNormalize)
|
||||
* @returns {{block:boolean, reason?:string}}
|
||||
*/
|
||||
export function decide({ toolName, filePath, normalizeImpl = pathNormalize }) {
|
||||
if (!WRITE_TOOLS.has(toolName)) return { block: false };
|
||||
const fp = String(filePath || '');
|
||||
if (!fp) return { block: false };
|
||||
let norm;
|
||||
try { norm = normalizeImpl(fp); } catch { return { block: false }; } // cannot determine → fail-open
|
||||
const normStr = String(norm || '');
|
||||
if (RUNTIME_RE.test(normStr)) {
|
||||
return {
|
||||
block: true,
|
||||
reason: `Write to «${norm}» denied — ~/.claude/runtime is a protected side-channel (git-approval anchor). Hooks write it via Node fs, not the Write tool.`,
|
||||
};
|
||||
}
|
||||
if (TRANSCRIPT_RE.test(normStr)) {
|
||||
return {
|
||||
block: true,
|
||||
reason: `Write to «${norm}» denied — ~/.claude/projects/**/*.jsonl are session/subagent transcripts (tamper-protected; the tdd-gate trusts them). The harness writes transcripts, never the Write tool. Memory *.md there stays writable.`,
|
||||
};
|
||||
}
|
||||
return { block: false };
|
||||
}
|
||||
|
||||
async function main() {
|
||||
try {
|
||||
const event = parseEventJson(await readStdin());
|
||||
const r = decide({
|
||||
toolName: event.tool_name,
|
||||
filePath: (event.tool_input && (event.tool_input.file_path || event.tool_input.notebook_path)) || '',
|
||||
});
|
||||
exitDecision({ block: r.block, message: r.reason });
|
||||
} catch {
|
||||
exitDecision({ block: false }); // fail-quiet
|
||||
}
|
||||
}
|
||||
|
||||
const isCli = process.argv[1] && process.argv[1].replace(/\\/g, '/').endsWith('/enforce-runtime-write-deny.mjs');
|
||||
if (isCli) main();
|
||||
@@ -0,0 +1,98 @@
|
||||
// tools/enforce-runtime-write-deny.test.mjs
|
||||
// Standalone write-deny on ~/.claude/runtime (router-gate v4 §"Side-channel
|
||||
// write-protection", item 1b / C3). Closes a pre-existing self-write hole on the
|
||||
// git-approval anchor; uses the resolving pathNormalize so `.`/`..` segments
|
||||
// cannot evade the match (V2-2).
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { decide } from './enforce-runtime-write-deny.mjs';
|
||||
import { homedir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
|
||||
const HOME = homedir();
|
||||
const HOME_FWD = HOME.replace(/\\/g, '/');
|
||||
|
||||
describe('enforce-runtime-write-deny decide()', () => {
|
||||
it('blocks a Write into ~/.claude/runtime (git-approval anchor)', () => {
|
||||
const r = decide({ toolName: 'Write', filePath: join(HOME, '.claude', 'runtime', 'askuser-decisions-S.jsonl') });
|
||||
expect(r.block).toBe(true);
|
||||
});
|
||||
|
||||
it('blocks the .-segment evasion (V2-2)', () => {
|
||||
// Raw string with `/./` — path.join would pre-collapse it, so build it literally.
|
||||
const evasion = `${HOME_FWD}/.claude/./runtime/x.jsonl`;
|
||||
const r = decide({ toolName: 'Write', filePath: evasion });
|
||||
expect(r.block).toBe(true);
|
||||
});
|
||||
|
||||
it('blocks Edit/MultiEdit/NotebookEdit too', () => {
|
||||
const p = join(HOME, '.claude', 'runtime', 'safe-baseline-ledger-S.json');
|
||||
expect(decide({ toolName: 'Edit', filePath: p }).block).toBe(true);
|
||||
expect(decide({ toolName: 'MultiEdit', filePath: p }).block).toBe(true);
|
||||
expect(decide({ toolName: 'NotebookEdit', filePath: p }).block).toBe(true);
|
||||
});
|
||||
|
||||
it('allows a Write to a normal project path', () => {
|
||||
const r = decide({ toolName: 'Write', filePath: join(HOME, 'project', 'src', 'x.mjs') });
|
||||
expect(r.block).toBe(false);
|
||||
});
|
||||
|
||||
it('ignores non-write tools', () => {
|
||||
expect(decide({ toolName: 'Read', filePath: join(HOME, '.claude', 'runtime', 'x') }).block).toBe(false);
|
||||
expect(decide({ toolName: 'Bash', filePath: join(HOME, '.claude', 'runtime', 'x') }).block).toBe(false);
|
||||
});
|
||||
|
||||
it('fail-open (no block) when the normalizer throws — never bricks the session', () => {
|
||||
const throwing = () => { throw new Error('boom'); };
|
||||
const r = decide({ toolName: 'Write', filePath: join(HOME, '.claude', 'runtime', 'x'), normalizeImpl: throwing });
|
||||
expect(r.block).toBe(false);
|
||||
});
|
||||
|
||||
it('blocks via injected normalizer that resolves into runtime', () => {
|
||||
const r = decide({ toolName: 'Write', filePath: 'whatever', normalizeImpl: () => '/home/u/.claude/runtime/x.jsonl' });
|
||||
expect(r.block).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
// Part 1 of Z (2026-05-31): close the transcript Write hole. The tdd-gate will
|
||||
// (Part 2) credit a subagent's RED from its agent-<id>.jsonl; that transcript
|
||||
// must therefore be unforgeable. The Write tool was the last ungated channel
|
||||
// into ~/.claude/projects/**/*.jsonl (Bash/PowerShell/Read gates already cover
|
||||
// it). Memory files there are .md and stay writable (they never match .jsonl$).
|
||||
describe('enforce-runtime-write-deny — transcript .jsonl protection (Z Part 1)', () => {
|
||||
it('blocks a Write to a subagent transcript under ~/.claude/projects', () => {
|
||||
const p = join(HOME, '.claude', 'projects', 'slug', 'sess-uuid', 'subagents', 'agent-abc.jsonl');
|
||||
expect(decide({ toolName: 'Write', filePath: p }).block).toBe(true);
|
||||
});
|
||||
|
||||
it('blocks a Write to the controller session transcript itself', () => {
|
||||
const p = join(HOME, '.claude', 'projects', 'slug', 'sess-uuid.jsonl');
|
||||
expect(decide({ toolName: 'Write', filePath: p }).block).toBe(true);
|
||||
});
|
||||
|
||||
it('blocks Edit/MultiEdit/NotebookEdit on a transcript .jsonl too', () => {
|
||||
const p = join(HOME, '.claude', 'projects', 'slug', 'sess', 'subagents', 'agent-x.jsonl');
|
||||
expect(decide({ toolName: 'Edit', filePath: p }).block).toBe(true);
|
||||
expect(decide({ toolName: 'MultiEdit', filePath: p }).block).toBe(true);
|
||||
expect(decide({ toolName: 'NotebookEdit', filePath: p }).block).toBe(true);
|
||||
});
|
||||
|
||||
it('blocks the .-segment evasion into projects transcripts', () => {
|
||||
const evasion = `${HOME_FWD}/.claude/projects/slug/./sess/subagents/agent-x.jsonl`;
|
||||
expect(decide({ toolName: 'Write', filePath: evasion }).block).toBe(true);
|
||||
});
|
||||
|
||||
it('ALLOWS a memory .md under ~/.claude/projects (never a .jsonl)', () => {
|
||||
const p = join(HOME, '.claude', 'projects', 'slug', 'memory', 'feedback_x.md');
|
||||
expect(decide({ toolName: 'Write', filePath: p }).block).toBe(false);
|
||||
});
|
||||
|
||||
it('ALLOWS a .jsonl OUTSIDE ~/.claude/projects (e.g. repo observer episodes)', () => {
|
||||
const p = join(HOME, 'repo', 'docs', 'observer', 'episodes-2026-05.jsonl');
|
||||
expect(decide({ toolName: 'Write', filePath: p }).block).toBe(false);
|
||||
});
|
||||
|
||||
it('ignores non-write tools on a transcript path', () => {
|
||||
const p = join(HOME, '.claude', 'projects', 'slug', 'sess', 'subagents', 'agent-x.jsonl');
|
||||
expect(decide({ toolName: 'Read', filePath: p }).block).toBe(false);
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,225 @@
|
||||
#!/usr/bin/env node
|
||||
/**
|
||||
* enforce-safe-baseline-metering — PreToolUse wrapper around the pure
|
||||
* safe-baseline-metering module (router-gate v4 §3.1.2 Direction 1).
|
||||
*
|
||||
* Catches skill-substitution laundering: many Read/Grep/Glob/LS/TodoWrite/
|
||||
* AskUserQuestion calls used as an analysis channel INSTEAD of invoking the
|
||||
* recommended Skill, then a mutating tool (Edit/Write/Bash/…) lands without any
|
||||
* skill ever matching. Safe-baseline tools themselves stay allowed (legit
|
||||
* continuation reading); only a mutating tool past the hard threshold is blocked.
|
||||
*
|
||||
* Stream H tail — adds the wrapper. Pure metering + threshold logic live in
|
||||
* safe-baseline-metering.mjs; this file is just the hook entry composition.
|
||||
*
|
||||
* Convention (mirrors enforce-decomposition-detector.mjs): the testable unit is
|
||||
* the pure `decide()` composition. The live `main()` — task-boundary inference,
|
||||
* skill-match detection from the transcript, and per-task counter persistence —
|
||||
* is a deferred no-op (exit 0) until that wiring is designed in the spec/plan.
|
||||
* Until then the hook NEVER blocks (no self-lockout, same posture as the sibling
|
||||
* Stream H wrappers). Settings.json registration is also deferred.
|
||||
*/
|
||||
import {
|
||||
incrementCounter,
|
||||
evaluateThresholds,
|
||||
DEFAULT_THRESHOLDS,
|
||||
newCounterState,
|
||||
shouldInheritTaskId,
|
||||
deriveTaskId,
|
||||
} from './safe-baseline-metering.mjs';
|
||||
import { readFileSync, writeFileSync, appendFileSync, mkdirSync } from 'node:fs';
|
||||
import { join } from 'node:path';
|
||||
import { homedir } from 'node:os';
|
||||
import {
|
||||
readStdin,
|
||||
parseEventJson,
|
||||
readTranscript,
|
||||
lastUserPromptText,
|
||||
lastTurnEntries,
|
||||
exitDecision,
|
||||
} from './enforce-hook-helpers.mjs';
|
||||
|
||||
/**
|
||||
* Pure decision: increment the per-task counter for `toolName`, then evaluate
|
||||
* thresholds against the resulting state.
|
||||
*
|
||||
* @param {object} args
|
||||
* @param {object} args.state - current per-task counter state (newCounterState shape)
|
||||
* @param {string} args.toolName - the tool about to run
|
||||
* @param {boolean} [args.skillMatched] - whether a recommended Skill matched in this task
|
||||
* @param {object} [args.thresholds] - override DEFAULT_THRESHOLDS
|
||||
* @returns {{state:object, action:'allow'|'soft_flag'|'hard_block', reason?:string}}
|
||||
*/
|
||||
export function decide({ state, toolName, skillMatched = false, thresholds = DEFAULT_THRESHOLDS }) {
|
||||
const next = incrementCounter(state, toolName);
|
||||
const evalResult = evaluateThresholds(next, toolName, skillMatched, thresholds);
|
||||
return { state: next, action: evalResult.action, reason: evalResult.reason };
|
||||
}
|
||||
|
||||
/**
|
||||
* Task-boundary head: decide whether the current event continues the prior task
|
||||
* or starts a fresh one, then meter it.
|
||||
*
|
||||
* Continuation rules (delegated to the pure module):
|
||||
* - no prior ledger → fresh task
|
||||
* - reset marker in promptText → fresh task (shouldInheritTaskId=false)
|
||||
* - keyword overlap with prior task < 2 → fresh task
|
||||
* - otherwise → inherit prior counters
|
||||
*
|
||||
* @param {object} args
|
||||
* @param {object} args.event - PreToolUse event ({ tool_name })
|
||||
* @param {object|null} args.priorLedger - { state, lastKeywords } from the last event, or null
|
||||
* @param {string[]} args.currentKeywords - keywords distilled from the current prompt
|
||||
* @param {string} args.promptText - the current user prompt (for reset-marker detection)
|
||||
* @param {boolean} [args.skillMatched] - whether a recommended Skill matched in this task
|
||||
* @param {object} [args.thresholds] - override DEFAULT_THRESHOLDS
|
||||
* @returns {{action:string, reason?:string, ledger:{state:object, lastKeywords:string[]}}}
|
||||
*/
|
||||
export function processEvent({
|
||||
event,
|
||||
priorLedger,
|
||||
currentKeywords = [],
|
||||
promptText = '',
|
||||
skillMatched = false,
|
||||
thresholds = DEFAULT_THRESHOLDS,
|
||||
}) {
|
||||
const toolName = event && event.tool_name;
|
||||
const inherit =
|
||||
priorLedger &&
|
||||
priorLedger.state &&
|
||||
shouldInheritTaskId(priorLedger.lastKeywords || [], currentKeywords, promptText);
|
||||
|
||||
const baseState = inherit
|
||||
? priorLedger.state
|
||||
: newCounterState({
|
||||
taskId: deriveTaskId(promptText),
|
||||
startedAtIso: '',
|
||||
firstPromptExcerpt: promptText,
|
||||
});
|
||||
|
||||
const d = decide({ state: baseState, toolName, skillMatched, thresholds });
|
||||
return {
|
||||
action: d.action,
|
||||
reason: d.reason,
|
||||
ledger: { state: d.state, lastKeywords: currentKeywords },
|
||||
};
|
||||
}
|
||||
|
||||
// ── 1b live-wiring: pure helpers (safe-baseline-live-wiring-design.md v4) ──
|
||||
|
||||
// Common RU imperatives + RU/EN stopwords that would otherwise create spurious
|
||||
// keyword overlap between unrelated tasks (G2). Length<4 tokens are dropped
|
||||
// separately; this set targets >=4-char common words.
|
||||
const STOPWORDS = new Set([
|
||||
'сделай', 'сделать', 'проверь', 'проверить', 'посмотри', 'добавь', 'добавить',
|
||||
'напиши', 'написать', 'нужно', 'надо', 'давай', 'можешь', 'потом', 'после',
|
||||
'перед', 'через', 'очень', 'если', 'чтобы', 'этот', 'эта', 'это', 'эти',
|
||||
'или', 'тоже', 'также', 'когда', 'пока', 'весь', 'всё', 'все', 'теперь',
|
||||
'здесь', 'там', 'нет', 'есть', 'будет', 'было', 'твой', 'мой', 'самый',
|
||||
'then', 'this', 'that', 'with', 'from', 'your', 'please', 'just', 'make',
|
||||
'check', 'look', 'need', 'want', 'also', 'into', 'more', 'very', 'should',
|
||||
'will', 'have', 'does', 'done', 'them', 'they', 'here', 'there',
|
||||
]);
|
||||
|
||||
/** Deterministic keyword extraction (H1): lowercase, drop <4-char + stopwords, unique, sorted. */
|
||||
export function extractKeywords(promptText) {
|
||||
if (typeof promptText !== 'string') return [];
|
||||
const tokens = promptText
|
||||
.toLowerCase()
|
||||
.split(/[^\p{L}\p{N}]+/u)
|
||||
.filter((t) => t.length >= 4 && !STOPWORDS.has(t));
|
||||
return [...new Set(tokens)].sort();
|
||||
}
|
||||
|
||||
const SKILL_MATCH_TOOLS = new Set(['Skill', 'EnterPlanMode']);
|
||||
|
||||
/** C2/V2-5: true iff the turn has a real assistant tool_use of Skill or EnterPlanMode. */
|
||||
export function detectSkillMatch(turnEntries) {
|
||||
if (!Array.isArray(turnEntries)) return false;
|
||||
for (const e of turnEntries) {
|
||||
const c = e && e.message && e.message.content;
|
||||
if (!Array.isArray(c)) continue;
|
||||
for (const b of c) {
|
||||
if (b && b.type === 'tool_use' && SKILL_MATCH_TOOLS.has(b.name)) return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* V2-1 stickiness contract: the pure pipeline neither persists nor task-scopes
|
||||
* skill-match, so this wrapper owns it. Compute inherit (same predicate as
|
||||
* processEvent), scope the prior sticky flag to inherit, OR in this turn's match,
|
||||
* run the decision, then write the effective flag back into the persisted state.
|
||||
*/
|
||||
export function runLiveDecision({ event, priorLedger, promptText, currentKeywords, skillMatchedThisTurn, thresholds }) {
|
||||
const inherit = !!(priorLedger && priorLedger.state &&
|
||||
shouldInheritTaskId(priorLedger.lastKeywords || [], currentKeywords, promptText));
|
||||
const priorSticky = inherit ? !!priorLedger.state.skill_match_within_task : false;
|
||||
const effectiveSkillMatched = priorSticky || !!skillMatchedThisTurn;
|
||||
|
||||
const res = processEvent({
|
||||
event, priorLedger, currentKeywords, promptText,
|
||||
skillMatched: effectiveSkillMatched, thresholds,
|
||||
});
|
||||
res.ledger.state.skill_match_within_task = effectiveSkillMatched;
|
||||
return res;
|
||||
}
|
||||
|
||||
// ── live I/O composition ──
|
||||
|
||||
const ESCAPE_MSG = 'invoke the recommended Skill, or EnterPlanMode, to proceed (skill/plan invocations are never blocked by this layer).';
|
||||
|
||||
function ledgerDir(override) {
|
||||
return override || join(homedir(), '.claude', 'runtime');
|
||||
}
|
||||
function loadLedger(dir, sess) {
|
||||
try { return JSON.parse(readFileSync(join(dir, `safe-baseline-ledger-${sess || 'unknown'}.json`), 'utf8')); }
|
||||
catch { return null; }
|
||||
}
|
||||
function saveLedger(dir, sess, ledger) {
|
||||
try {
|
||||
mkdirSync(dir, { recursive: true });
|
||||
writeFileSync(join(dir, `safe-baseline-ledger-${sess || 'unknown'}.json`), JSON.stringify(ledger));
|
||||
} catch { /* fail-quiet */ }
|
||||
}
|
||||
function logFlag(dir, sess, entry) {
|
||||
try {
|
||||
mkdirSync(dir, { recursive: true });
|
||||
appendFileSync(join(dir, `safe-baseline-flags-${sess || 'unknown'}.jsonl`),
|
||||
JSON.stringify({ ts: new Date().toISOString(), ...entry }) + '\n');
|
||||
} catch { /* ignore */ }
|
||||
}
|
||||
|
||||
/** Testable live head: returns {block, message?} and persists the ledger. Fail-quiet. */
|
||||
export async function runMain({ event, runtimeDir, transcript: injectedTranscript } = {}) {
|
||||
try {
|
||||
const sess = event.session_id;
|
||||
const dir = ledgerDir(runtimeDir);
|
||||
const transcript = injectedTranscript || readTranscript(event.transcript_path);
|
||||
const promptText = lastUserPromptText(transcript) || '';
|
||||
const currentKeywords = extractKeywords(promptText);
|
||||
const skillMatchedThisTurn = detectSkillMatch(lastTurnEntries(transcript)) ||
|
||||
SKILL_MATCH_TOOLS.has(event.tool_name);
|
||||
const priorLedger = loadLedger(dir, sess);
|
||||
|
||||
const res = runLiveDecision({ event, priorLedger, promptText, currentKeywords, skillMatchedThisTurn });
|
||||
saveLedger(dir, sess, res.ledger);
|
||||
|
||||
if (res.action === 'soft_flag') logFlag(dir, sess, { tool: event.tool_name, reason: res.reason });
|
||||
if (res.action === 'hard_block') return { block: true, message: `[safe-baseline] ${res.reason}\n${ESCAPE_MSG}` };
|
||||
return { block: false };
|
||||
} catch {
|
||||
return { block: false }; // fail-quiet — never crash the session
|
||||
}
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const event = parseEventJson(await readStdin());
|
||||
const res = await runMain({ event });
|
||||
exitDecision(res);
|
||||
}
|
||||
|
||||
if ((process.argv[1] || '').replace(/\\/g, '/').endsWith('/enforce-safe-baseline-metering.mjs')) {
|
||||
main().catch(() => process.exit(0));
|
||||
}
|
||||
@@ -0,0 +1,283 @@
|
||||
// tools/enforce-safe-baseline-metering.test.mjs
|
||||
// Stream H tail — wrapper tests around the pure safe-baseline-metering module
|
||||
// (router-gate v4 §3.1.2 Direction 1). Mirrors the enforce-decomposition-detector
|
||||
// convention: implement + test a pure `decide()` composition; live main() wiring
|
||||
// (transcript task-boundary + skill detection + state persistence) is now live
|
||||
// (1b — safe-baseline-live-wiring-design.md v4).
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { decide, processEvent, extractKeywords, detectSkillMatch, runLiveDecision, runMain } from './enforce-safe-baseline-metering.mjs';
|
||||
import { newCounterState } from './safe-baseline-metering.mjs';
|
||||
import { mkdtempSync, writeFileSync, existsSync } from 'node:fs';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
|
||||
function freshState() {
|
||||
return newCounterState({ taskId: 't', startedAtIso: '2026-05-29T00:00:00Z', firstPromptExcerpt: 'p' });
|
||||
}
|
||||
function withCounts(patch) {
|
||||
const s = freshState();
|
||||
return { ...s, counts: { ...s.counts, ...patch } };
|
||||
}
|
||||
|
||||
describe('enforce-safe-baseline-metering decide()', () => {
|
||||
it('allows a metered Read below warn threshold and increments its counter', () => {
|
||||
const r = decide({ state: freshState(), toolName: 'Read', skillMatched: false });
|
||||
expect(r.action).toBe('allow');
|
||||
expect(r.state.counts.Read).toBe(1);
|
||||
});
|
||||
|
||||
it('soft_flags a metered Read once it reaches the warn threshold (29→30)', () => {
|
||||
const r = decide({ state: withCounts({ Read: 29 }), toolName: 'Read', skillMatched: false });
|
||||
expect(r.action).toBe('soft_flag');
|
||||
expect(r.state.counts.Read).toBe(30);
|
||||
});
|
||||
|
||||
it('hard_blocks a mutating tool when a metered counter is at its hard limit, no skill', () => {
|
||||
const r = decide({ state: withCounts({ Read: 60 }), toolName: 'Edit', skillMatched: false });
|
||||
expect(r.action).toBe('hard_block');
|
||||
expect(r.reason).toContain('Read=60');
|
||||
});
|
||||
|
||||
it('allows the mutating tool when a skill was matched, even past the hard limit', () => {
|
||||
const r = decide({ state: withCounts({ Read: 60 }), toolName: 'Edit', skillMatched: true });
|
||||
expect(r.action).toBe('allow');
|
||||
});
|
||||
|
||||
it('allows (and does not count) a tool that is neither metered nor mutating', () => {
|
||||
const r = decide({ state: freshState(), toolName: 'WebFetch', skillMatched: false });
|
||||
expect(r.action).toBe('allow');
|
||||
expect(r.state.counts.Read).toBe(0);
|
||||
});
|
||||
|
||||
it('does not mutate the caller-provided state object (immutability)', () => {
|
||||
const s = freshState();
|
||||
decide({ state: s, toolName: 'Read', skillMatched: false });
|
||||
expect(s.counts.Read).toBe(0);
|
||||
});
|
||||
|
||||
it('maps TodoWrite to TodoWrite_writes and soft_flags at its warn threshold (4→5)', () => {
|
||||
const r = decide({ state: withCounts({ TodoWrite_writes: 4 }), toolName: 'TodoWrite', skillMatched: false });
|
||||
expect(r.state.counts.TodoWrite_writes).toBe(5);
|
||||
expect(r.action).toBe('soft_flag');
|
||||
});
|
||||
|
||||
it('keeps a metered Grep allowed once past its hard threshold (continuation reading)', () => {
|
||||
const r = decide({ state: withCounts({ Grep: 30 }), toolName: 'Grep', skillMatched: false });
|
||||
expect(r.action).toBe('allow');
|
||||
expect(r.state.counts.Grep).toBe(31);
|
||||
});
|
||||
|
||||
it('hard_blocks a mutating Bash when TodoWrite_writes is at its hard limit', () => {
|
||||
const r = decide({ state: withCounts({ TodoWrite_writes: 15 }), toolName: 'Bash', skillMatched: false });
|
||||
expect(r.action).toBe('hard_block');
|
||||
expect(r.reason).toContain('TodoWrite_writes=15');
|
||||
});
|
||||
});
|
||||
|
||||
describe('enforce-safe-baseline-metering processEvent() — task-boundary head', () => {
|
||||
it('starts a fresh task when there is no prior ledger', () => {
|
||||
const r = processEvent({
|
||||
event: { tool_name: 'Read' },
|
||||
priorLedger: null,
|
||||
currentKeywords: ['router', 'gate', 'safe'],
|
||||
promptText: 'почини safe-baseline',
|
||||
skillMatched: false,
|
||||
});
|
||||
expect(r.action).toBe('allow');
|
||||
expect(r.ledger.state.counts.Read).toBe(1);
|
||||
expect(r.ledger.lastKeywords).toEqual(['router', 'gate', 'safe']);
|
||||
});
|
||||
|
||||
it('continues the prior task when keywords overlap >=2 and no reset marker', () => {
|
||||
const prior = {
|
||||
state: { ...newCounterState({ taskId: 't', startedAtIso: '2026-05-29T00:00:00Z', firstPromptExcerpt: 'p' }), counts: { Read: 29, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 } },
|
||||
lastKeywords: ['router', 'gate', 'safe'],
|
||||
};
|
||||
const r = processEvent({
|
||||
event: { tool_name: 'Read' },
|
||||
priorLedger: prior,
|
||||
currentKeywords: ['router', 'gate', 'extra'],
|
||||
promptText: 'дальше по safe-baseline',
|
||||
skillMatched: false,
|
||||
});
|
||||
expect(r.ledger.state.counts.Read).toBe(30);
|
||||
expect(r.action).toBe('soft_flag');
|
||||
});
|
||||
|
||||
it('resets to a fresh task on a reset marker even if keywords overlap', () => {
|
||||
const prior = {
|
||||
state: { ...newCounterState({ taskId: 't', startedAtIso: '2026-05-29T00:00:00Z', firstPromptExcerpt: 'p' }), counts: { Read: 29, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 } },
|
||||
lastKeywords: ['router', 'gate', 'safe'],
|
||||
};
|
||||
const r = processEvent({
|
||||
event: { tool_name: 'Read' },
|
||||
priorLedger: prior,
|
||||
currentKeywords: ['router', 'gate', 'safe'],
|
||||
promptText: 'новая задача — посмотри другое',
|
||||
skillMatched: false,
|
||||
});
|
||||
expect(r.ledger.state.counts.Read).toBe(1);
|
||||
});
|
||||
|
||||
it('starts a fresh task when keyword overlap is below 2', () => {
|
||||
const prior = {
|
||||
state: { ...newCounterState({ taskId: 't', startedAtIso: '2026-05-29T00:00:00Z', firstPromptExcerpt: 'p' }), counts: { Read: 29, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 } },
|
||||
lastKeywords: ['router', 'gate', 'safe'],
|
||||
};
|
||||
const r = processEvent({
|
||||
event: { tool_name: 'Read' },
|
||||
priorLedger: prior,
|
||||
currentKeywords: ['totally', 'different', 'topic'],
|
||||
promptText: 'другая тема',
|
||||
skillMatched: false,
|
||||
});
|
||||
expect(r.ledger.state.counts.Read).toBe(1);
|
||||
});
|
||||
|
||||
it('allows a mutating tool past the hard limit when a skill matched', () => {
|
||||
const prior = {
|
||||
state: { ...newCounterState({ taskId: 't', startedAtIso: '2026-05-29T00:00:00Z', firstPromptExcerpt: 'p' }), counts: { Read: 60, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 } },
|
||||
lastKeywords: ['router', 'gate', 'safe'],
|
||||
};
|
||||
const r = processEvent({
|
||||
event: { tool_name: 'Edit' },
|
||||
priorLedger: prior,
|
||||
currentKeywords: ['router', 'gate', 'safe'],
|
||||
promptText: 'продолжаем',
|
||||
skillMatched: true,
|
||||
});
|
||||
expect(r.action).toBe('allow');
|
||||
});
|
||||
});
|
||||
|
||||
// ── 1b live-wiring: new pure helpers ──
|
||||
|
||||
describe('extractKeywords (H1)', () => {
|
||||
it('lowercases, drops <4-char tokens, returns unique sorted', () => {
|
||||
expect(extractKeywords('Router GATE safe baseline router')).toEqual(['baseline', 'gate', 'router', 'safe']);
|
||||
});
|
||||
it('drops common RU imperatives so unrelated tasks do not falsely overlap', () => {
|
||||
const a = extractKeywords('сделай проверь биллинг тариф');
|
||||
const b = extractKeywords('сделай проверь регион маршрут');
|
||||
const overlap = a.filter((k) => b.includes(k));
|
||||
expect(overlap).toEqual([]);
|
||||
});
|
||||
it('returns [] for empty/non-string', () => {
|
||||
expect(extractKeywords('')).toEqual([]);
|
||||
expect(extractKeywords(null)).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
||||
function asstToolUse(name, input = {}) {
|
||||
return { message: { role: 'assistant', content: [{ type: 'tool_use', name, input }] } };
|
||||
}
|
||||
|
||||
describe('detectSkillMatch (C2/V2-5)', () => {
|
||||
it('true when the turn has a Skill tool_use', () => {
|
||||
expect(detectSkillMatch([asstToolUse('Skill', { skill: 'superpowers:brainstorming' })])).toBe(true);
|
||||
});
|
||||
it('true when the turn has an EnterPlanMode tool_use', () => {
|
||||
expect(detectSkillMatch([asstToolUse('EnterPlanMode')])).toBe(true);
|
||||
});
|
||||
it('false for Read tool_use or plain text mention of a plan path (no self-grant)', () => {
|
||||
expect(detectSkillMatch([asstToolUse('Read', { file_path: 'docs/superpowers/plans/x.md' })])).toBe(false);
|
||||
expect(detectSkillMatch([{ message: { role: 'assistant', content: [{ type: 'text', text: 'docs/superpowers/plans/x.md' }] } }])).toBe(false);
|
||||
});
|
||||
it('false for empty/non-array', () => {
|
||||
expect(detectSkillMatch([])).toBe(false);
|
||||
expect(detectSkillMatch(null)).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
function ledgerWith(counts, skill, keywords) {
|
||||
return {
|
||||
state: {
|
||||
...newCounterState({ taskId: 't', startedAtIso: '2026-05-30T00:00:00Z', firstPromptExcerpt: 'p' }),
|
||||
counts: { Read: 0, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0, ...counts },
|
||||
skill_match_within_task: skill,
|
||||
},
|
||||
lastKeywords: keywords,
|
||||
};
|
||||
}
|
||||
|
||||
describe('runLiveDecision — stickiness contract (V2-1)', () => {
|
||||
it('persists skillMatchedThisTurn into the ledger (stickiness not lost)', () => {
|
||||
const r = runLiveDecision({
|
||||
event: { tool_name: 'Read' }, priorLedger: null,
|
||||
promptText: 'router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
|
||||
skillMatchedThisTurn: true,
|
||||
});
|
||||
expect(r.ledger.state.skill_match_within_task).toBe(true);
|
||||
});
|
||||
|
||||
it('a skill earlier in a task keeps later mutating ops allowed past the hard limit (no false block)', () => {
|
||||
const prior = ledgerWith({ Read: 60 }, true, ['router', 'gate', 'safe', 'baseline']);
|
||||
const r = runLiveDecision({
|
||||
event: { tool_name: 'Edit' }, priorLedger: prior,
|
||||
promptText: 'продолжаем router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
|
||||
skillMatchedThisTurn: false,
|
||||
});
|
||||
expect(r.action).toBe('allow');
|
||||
});
|
||||
|
||||
it('skill match in task A does NOT exempt an unrelated task B (no cross-task leak)', () => {
|
||||
const prior = ledgerWith({ Read: 60 }, true, ['router', 'gate', 'safe', 'baseline']);
|
||||
const r = runLiveDecision({
|
||||
event: { tool_name: 'Edit' }, priorLedger: prior,
|
||||
promptText: 'регион маршрут лиды поставщик', currentKeywords: ['регион', 'маршрут', 'лиды', 'поставщик'],
|
||||
skillMatchedThisTurn: false,
|
||||
});
|
||||
expect(r.ledger.state.skill_match_within_task).toBe(false);
|
||||
expect(r.ledger.state.counts.Read).toBe(0);
|
||||
});
|
||||
|
||||
it('hard-blocks a mutating tool past the limit in a no-skill task', () => {
|
||||
const prior = ledgerWith({ Read: 60 }, false, ['router', 'gate', 'safe', 'baseline']);
|
||||
const r = runLiveDecision({
|
||||
event: { tool_name: 'Edit' }, priorLedger: prior,
|
||||
promptText: 'router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
|
||||
skillMatchedThisTurn: false,
|
||||
});
|
||||
expect(r.action).toBe('hard_block');
|
||||
});
|
||||
});
|
||||
|
||||
describe('runMain — live integration', () => {
|
||||
function fixtureTranscript(path, entries) {
|
||||
writeFileSync(path, entries.map((e) => JSON.stringify(e)).join('\n'));
|
||||
}
|
||||
|
||||
it('blocks an Edit when Read past hard with no skill, and names the escape', async () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), 'sbm-'));
|
||||
const tpath = join(dir, 't.jsonl');
|
||||
writeFileSync(join(dir, 'safe-baseline-ledger-S.json'), JSON.stringify({
|
||||
state: { schema_version: 1, task_id: 't', counts: { Read: 60, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 }, skill_match_within_task: false },
|
||||
lastKeywords: ['router', 'gate', 'safe', 'baseline'],
|
||||
}));
|
||||
fixtureTranscript(tpath, [{ type: 'user', message: { role: 'user', content: 'router gate safe baseline' } }]);
|
||||
const res = await runMain({ event: { tool_name: 'Edit', session_id: 'S', transcript_path: tpath }, runtimeDir: dir });
|
||||
expect(res.block).toBe(true);
|
||||
expect(res.message).toMatch(/EnterPlanMode|Skill/);
|
||||
});
|
||||
|
||||
it('allows a fresh task and persists the ledger', async () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), 'sbm-'));
|
||||
const tpath = join(dir, 't.jsonl');
|
||||
fixtureTranscript(tpath, [{ type: 'user', message: { role: 'user', content: 'регион маршрут лиды' } }]);
|
||||
const res = await runMain({ event: { tool_name: 'Read', session_id: 'S2', transcript_path: tpath }, runtimeDir: dir });
|
||||
expect(res.block).toBe(false);
|
||||
expect(existsSync(join(dir, 'safe-baseline-ledger-S2.json'))).toBe(true);
|
||||
});
|
||||
|
||||
it('allows an Edit (escape) when the current event is a Skill invocation', async () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), 'sbm-'));
|
||||
const tpath = join(dir, 't.jsonl');
|
||||
writeFileSync(join(dir, 'safe-baseline-ledger-S3.json'), JSON.stringify({
|
||||
state: { schema_version: 1, task_id: 't', counts: { Read: 60, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 }, skill_match_within_task: false },
|
||||
lastKeywords: ['router', 'gate', 'safe', 'baseline'],
|
||||
}));
|
||||
fixtureTranscript(tpath, [{ type: 'user', message: { role: 'user', content: 'router gate safe baseline' } }]);
|
||||
const res = await runMain({ event: { tool_name: 'Skill', session_id: 'S3', transcript_path: tpath }, runtimeDir: dir });
|
||||
expect(res.block).toBe(false);
|
||||
});
|
||||
});
|
||||
+75
-13
@@ -27,6 +27,7 @@ import {
|
||||
isProductionCodePath,
|
||||
readRouterState,
|
||||
} from './enforce-hook-helpers.mjs';
|
||||
import { join, dirname, basename } from 'node:path';
|
||||
|
||||
const RULE_KEY_TDD = 'tdd-gate';
|
||||
const RULE_KEY_PLAN = 'writing-plans-required';
|
||||
@@ -132,8 +133,56 @@ function hasPlanIndicator(turn) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const AGENT_ID_RE = /agentId:\s*([0-9a-f]+)/i;
|
||||
|
||||
/**
|
||||
* Cross-actor (Z Part 2): extract agentIds of subagents spawned by a `Task`
|
||||
* tool in the controller's current turn. The agentId comes from the harness-
|
||||
* written Task tool_result text ("agentId: <hex>") — the controller cannot forge
|
||||
* a tool_result in its own transcript. Only hex ids are accepted, so a crafted
|
||||
* "agentId: ../../x" cannot become a path-traversal into an arbitrary file.
|
||||
*/
|
||||
export function turnTaskAgentIds(turn) {
|
||||
const taskUseIds = new Set();
|
||||
for (const e of turn || []) {
|
||||
const c = e && e.message && e.message.content;
|
||||
if (!Array.isArray(c)) continue;
|
||||
for (const b of c) {
|
||||
if (b && b.type === 'tool_use' && b.name === 'Task') taskUseIds.add(b.id);
|
||||
}
|
||||
}
|
||||
const ids = [];
|
||||
for (const e of turn || []) {
|
||||
const c = e && e.message && e.message.content;
|
||||
if (!Array.isArray(c)) continue;
|
||||
for (const b of c) {
|
||||
if (!b || b.type !== 'tool_result' || !taskUseIds.has(b.tool_use_id)) continue;
|
||||
const txt = typeof b.content === 'string' ? b.content
|
||||
: Array.isArray(b.content) ? b.content.map((p) => p && p.text).filter(Boolean).join('\n') : '';
|
||||
const m = txt.match(AGENT_ID_RE);
|
||||
if (m) ids.push(m[1]);
|
||||
}
|
||||
}
|
||||
return ids;
|
||||
}
|
||||
|
||||
/**
|
||||
* Derive subagent transcript paths from the controller transcript path and a
|
||||
* list of agentIds. Subagent transcripts live at
|
||||
* <projects>/<slug>/<controller-session>/subagents/agent-<agentId>.jsonl
|
||||
* i.e. nested under the controller session's own directory (bound to it), while
|
||||
* the controller transcript is <...>/<controller-session>.jsonl.
|
||||
*/
|
||||
export function subagentTranscriptPaths(controllerTranscriptPath, agentIds) {
|
||||
const p = String(controllerTranscriptPath || '');
|
||||
if (!p) return [];
|
||||
const dir = dirname(p);
|
||||
const base = basename(p).replace(/\.jsonl$/i, '');
|
||||
return (agentIds || []).map((id) => join(dir, base, 'subagents', `agent-${id}.jsonl`));
|
||||
}
|
||||
|
||||
export function decide({
|
||||
toolName, filePath, transcriptEntries, classification, override, overridePlan,
|
||||
toolName, filePath, transcriptEntries, classification, override, overridePlan, subagentEntriesList = [],
|
||||
}) {
|
||||
if (!['Edit', 'Write', 'MultiEdit'].includes(toolName)) return { block: false };
|
||||
if (!isProductionCodePath(filePath)) return { block: false };
|
||||
@@ -150,36 +199,37 @@ export function decide({
|
||||
`[enforce-tdd-gate] task_type="${taskType}" requires a plan before production-code edit.`,
|
||||
`Either invoke superpowers:writing-plans via Skill tool,`,
|
||||
`or reference an existing plan file (docs/superpowers/plans/...) in this turn first.`,
|
||||
``,
|
||||
`Override: "быстрый коммит" / "ремонт инфраструктуры" in your prompt.`,
|
||||
].join('\n'),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Rule #3 — TDD gate.
|
||||
// Rule #3 — TDD gate. Credit the controller's own turn OR a subagent that was
|
||||
// spawned by a Task in this turn (cross-actor, Z Part 2). Subagent evidence is
|
||||
// read from its agent-<id>.jsonl, which is tamper-protected by the transcript
|
||||
// Write-deny (Z Part 1) — so crediting it does not open a forgery channel.
|
||||
if (override) return { block: false };
|
||||
const hasTest = hasMatchingTestEdit(turn, filePath);
|
||||
const subList = Array.isArray(subagentEntriesList) ? subagentEntriesList : [];
|
||||
const hasTest = hasMatchingTestEdit(turn, filePath) || subList.some((es) => hasMatchingTestEdit(es, filePath));
|
||||
if (!hasTest) {
|
||||
return {
|
||||
block: true,
|
||||
message: [
|
||||
`[enforce-tdd-gate] Production code edit on "${filePath}" without preceding test edit.`,
|
||||
`Write the failing test FIRST in the corresponding *.test.mjs / *.spec.ts / *Test.php.`,
|
||||
`Write the failing test FIRST in the corresponding *.test.mjs / *.spec.ts / *Test.php`,
|
||||
`(a subagent's test edit, if it was spawned by a Task in this turn, is also credited).`,
|
||||
`Then run vitest/pest to confirm RED, then return to this prod-code Edit.`,
|
||||
``,
|
||||
`Override: "срочно" / "быстрый коммит" / "ремонт инфраструктуры".`,
|
||||
].join('\n'),
|
||||
};
|
||||
}
|
||||
if (!hasFailingTestRun(turn)) {
|
||||
const hasRed = hasFailingTestRun(turn) || subList.some((es) => hasFailingTestRun(es));
|
||||
if (!hasRed) {
|
||||
return {
|
||||
block: true,
|
||||
message: [
|
||||
`[enforce-tdd-gate] Test was edited but no vitest/pest run with RED output observed in this turn.`,
|
||||
`[enforce-tdd-gate] Test was edited but no vitest/pest run with RED output observed in this turn`,
|
||||
`(nor in any in-turn subagent transcript).`,
|
||||
`Run the test suite (vitest run <test-file> / composer test) to confirm RED before prod-code edit.`,
|
||||
``,
|
||||
`Override: "срочно" / "быстрый коммит" / "ремонт инфраструктуры".`,
|
||||
].join('\n'),
|
||||
};
|
||||
}
|
||||
@@ -205,7 +255,19 @@ async function main() {
|
||||
task_type: state.classification.task_type,
|
||||
} : null;
|
||||
|
||||
const result = decide({ toolName, filePath, transcriptEntries: transcript, classification, override, overridePlan });
|
||||
// Cross-actor (Z Part 2): read transcripts of subagents spawned by a Task in
|
||||
// this turn, bound to the controller session via the derived path. Best-effort
|
||||
// — a missing/unreadable subagent transcript just yields no extra credit
|
||||
// (stricter), never an error.
|
||||
let subagentEntriesList = [];
|
||||
try {
|
||||
const turn = lastTurnEntries(transcript);
|
||||
const agentIds = turnTaskAgentIds(turn);
|
||||
const paths = subagentTranscriptPaths(event.transcript_path, agentIds);
|
||||
subagentEntriesList = paths.map((p) => readTranscript(p)).filter((e) => Array.isArray(e) && e.length);
|
||||
} catch { subagentEntriesList = []; }
|
||||
|
||||
const result = decide({ toolName, filePath, transcriptEntries: transcript, classification, override, overridePlan, subagentEntriesList });
|
||||
exitDecision(result);
|
||||
} catch {
|
||||
exitDecision({ block: false });
|
||||
|
||||
@@ -1,5 +1,79 @@
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { decide } from './enforce-tdd-gate.mjs';
|
||||
import { decide, turnTaskAgentIds, subagentTranscriptPaths } from './enforce-tdd-gate.mjs';
|
||||
|
||||
// Z Part 2 (2026-05-31): the tdd-gate must credit a subagent's test edit + RED
|
||||
// when that subagent was spawned by a Task in the controller's current turn.
|
||||
// Pairs with the transcript Write-hole closed in enforce-runtime-write-deny.mjs
|
||||
// (Z Part 1) so the credited agent-<id>.jsonl cannot be forged.
|
||||
describe('enforce-tdd-gate Z cross-actor (pairs with enforce-runtime-write-deny Part 1)', () => {
|
||||
const subagentRedRun = [
|
||||
{ message: { role: 'user', content: 'write the failing test for foo and confirm RED' } },
|
||||
{ message: { role: 'assistant', content: [
|
||||
{ type: 'tool_use', id: 's1', name: 'Write', input: { file_path: 'tools/foo.test.mjs' } },
|
||||
{ type: 'tool_use', id: 's2', name: 'Bash', input: { command: 'npx vitest run tools/foo.test.mjs' } },
|
||||
] } },
|
||||
{ message: { role: 'user', content: [ { type: 'tool_result', tool_use_id: 's2', content: 'Tests 1 failed | 0 passed' } ] } },
|
||||
];
|
||||
|
||||
it('credits a subagent test edit + RED for the controller prod edit', () => {
|
||||
const r = decide({
|
||||
toolName: 'Edit',
|
||||
filePath: 'tools/foo.mjs',
|
||||
transcriptEntries: [
|
||||
{ message: { role: 'user', content: 'delegate the test, then I implement' } },
|
||||
{ message: { role: 'assistant', content: [ { type: 'tool_use', id: 't1', name: 'Task', input: { subagent_type: 'tester' } } ] } },
|
||||
{ message: { role: 'user', content: [ { type: 'tool_result', tool_use_id: 't1', content: 'done. agentId: a1234abcd' } ] } },
|
||||
],
|
||||
subagentEntriesList: [subagentRedRun],
|
||||
});
|
||||
expect(r.block).toBe(false);
|
||||
});
|
||||
|
||||
it('still blocks when subagent edited a test but NO RED exists anywhere', () => {
|
||||
const subNoRed = [
|
||||
{ message: { role: 'user', content: 'write test' } },
|
||||
{ message: { role: 'assistant', content: [ { type: 'tool_use', id: 's1', name: 'Write', input: { file_path: 'tools/foo.test.mjs' } } ] } },
|
||||
];
|
||||
const r = decide({
|
||||
toolName: 'Edit', filePath: 'tools/foo.mjs',
|
||||
transcriptEntries: [ { message: { role: 'user', content: 'go' } } ],
|
||||
subagentEntriesList: [subNoRed],
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.message).toMatch(/RED/);
|
||||
});
|
||||
|
||||
it('preserves old behavior when no subagent entries (blocks without test)', () => {
|
||||
const r = decide({
|
||||
toolName: 'Edit', filePath: 'tools/foo.mjs',
|
||||
transcriptEntries: [ { message: { role: 'user', content: 'go' } } ],
|
||||
subagentEntriesList: [],
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.message).toMatch(/without preceding test edit/);
|
||||
});
|
||||
|
||||
it('turnTaskAgentIds extracts a hex agentId from an in-turn Task tool_result', () => {
|
||||
const turn = [
|
||||
{ message: { role: 'assistant', content: [ { type: 'tool_use', id: 't1', name: 'Task', input: {} } ] } },
|
||||
{ message: { role: 'user', content: [ { type: 'tool_result', tool_use_id: 't1', content: 'ok agentId: a1b2c3d4e5' } ] } },
|
||||
];
|
||||
expect(turnTaskAgentIds(turn)).toContain('a1b2c3d4e5');
|
||||
});
|
||||
|
||||
it('turnTaskAgentIds ignores non-Task results and rejects non-hex ids (no path traversal)', () => {
|
||||
const turn = [
|
||||
{ message: { role: 'assistant', content: [ { type: 'tool_use', id: 'b1', name: 'Bash', input: {} } ] } },
|
||||
{ message: { role: 'user', content: [ { type: 'tool_result', tool_use_id: 'b1', content: 'agentId: ../../evil' } ] } },
|
||||
];
|
||||
expect(turnTaskAgentIds(turn)).toHaveLength(0);
|
||||
});
|
||||
|
||||
it('subagentTranscriptPaths derives <dir>/<sessbase>/subagents/agent-<id>.jsonl', () => {
|
||||
const paths = subagentTranscriptPaths('/p/projects/slug/sessUUID.jsonl', ['a1b2']);
|
||||
expect(paths[0].split('\\').join('/')).toBe('/p/projects/slug/sessUUID/subagents/agent-a1b2.jsonl');
|
||||
});
|
||||
});
|
||||
|
||||
function userMsg(text) {
|
||||
return { message: { role: 'user', content: text } };
|
||||
@@ -38,6 +112,8 @@ describe('enforce-tdd-gate / decide', () => {
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.message).toMatch(/without preceding test edit/);
|
||||
// 1A (2026-05-31): не рекламировать мёртвые override-фразы (findOverride — заглушка v4).
|
||||
expect(r.message).not.toMatch(/Override:/);
|
||||
});
|
||||
|
||||
it('blocks when test edited but no vitest RED observed', () => {
|
||||
@@ -51,6 +127,8 @@ describe('enforce-tdd-gate / decide', () => {
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.message).toMatch(/no vitest.*RED/);
|
||||
// 1A (2026-05-31): не рекламировать мёртвые override-фразы (findOverride — заглушка v4).
|
||||
expect(r.message).not.toMatch(/Override:/);
|
||||
});
|
||||
|
||||
it('allows after test edit + vitest RED', () => {
|
||||
@@ -107,6 +185,8 @@ describe('enforce-tdd-gate / decide', () => {
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.message).toMatch(/requires a plan/);
|
||||
// 1A (2026-05-31): не рекламировать мёртвые override-фразы (findOverride — заглушка v4).
|
||||
expect(r.message).not.toMatch(/Override:/);
|
||||
});
|
||||
|
||||
it('allows feature edit when Skill(superpowers:writing-plans) invoked', () => {
|
||||
|
||||
@@ -70,8 +70,6 @@ export function decide({ toolName, command, sentinel, sentinelAge, override, ove
|
||||
message: [
|
||||
`[enforce-verify-before-push] No verification artifact found.`,
|
||||
`Run a full test suite first (vitest run / composer test) before \`git ${kind}\`.`,
|
||||
``,
|
||||
`Override: "срочно" / "быстрый коммит" / "ремонт инфраструктуры" in your prompt.`,
|
||||
].join('\n'),
|
||||
};
|
||||
}
|
||||
|
||||
@@ -153,6 +153,9 @@ describe('enforce-verify-before-push / decide', () => {
|
||||
});
|
||||
expect(r.block).toBe(true);
|
||||
expect(r.message).toMatch(/No verification/);
|
||||
// 1A (2026-05-31): не рекламировать мёртвые override-фразы (findOverride — заглушка v4).
|
||||
expect(r.message).not.toMatch(/Override:/);
|
||||
expect(r.message).not.toMatch(/срочно|ремонт инфраструктуры/);
|
||||
});
|
||||
|
||||
it('does NOT emit override-missing-justification diagnostic for overrides without requires_justification', () => {
|
||||
|
||||
@@ -0,0 +1,84 @@
|
||||
#!/usr/bin/env node
|
||||
/**
|
||||
* llm-judge-config — the Layer 4 enabling-gate for router-gate v4.
|
||||
*
|
||||
* The LLM-judge engine (llm-judge.mjs) is fully built but MUST stay OFF until
|
||||
* the owner deliberately turns it on, because enabling it incurs real LLM cost
|
||||
* (~$300–1500/month per the v4.1 amendment). This module is the single switch.
|
||||
*
|
||||
* SAFE-BY-DEFAULT CONTRACT:
|
||||
* enabled === true ⇔ the explicit flag ROUTER_LLM_JUDGE_ENABLED is truthy
|
||||
* AND a key is resolvable (keychain first, then env).
|
||||
* Anything else → enabled:false. Building this file does NOT enable the judge:
|
||||
* with no flag and no key the gate is closed. keychainGet errors degrade to
|
||||
* "no key, disabled" (never throw).
|
||||
*
|
||||
* Activation (a separate, owner-driven step — NOT done here):
|
||||
* 1. store the API key in the OS keychain (or set ROUTER_LLM_KEY),
|
||||
* 2. set ROUTER_LLM_JUDGE_ENABLED=1,
|
||||
* 3. register the enforce-llm-judge-* hooks in .claude/settings.json.
|
||||
* Cost starts only after all three.
|
||||
*/
|
||||
import { JUDGE_MODELS } from './llm-judge.mjs';
|
||||
|
||||
const ENABLE_FLAG = 'ROUTER_LLM_JUDGE_ENABLED';
|
||||
const KEY_ENV = 'ROUTER_LLM_KEY';
|
||||
const BASE_URL_ENV = 'ROUTER_LLM_BASE_URL';
|
||||
const KEYCHAIN_SERVICE = 'router-gate-llm-judge';
|
||||
const KEYCHAIN_ACCOUNT = 'default';
|
||||
|
||||
function isTruthyFlag(v) {
|
||||
if (typeof v !== 'string') return false;
|
||||
return v.trim().toLowerCase() === '1' || v.trim().toLowerCase() === 'true';
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the Layer 4 judge configuration.
|
||||
*
|
||||
* @param {object} [args]
|
||||
* @param {object} [args.env] - environment map (defaults to process.env)
|
||||
* @param {Function} [args.keychainGet] - () => string|null, OS-keychain reader (injectable for tests)
|
||||
* @returns {{enabled:boolean, apiKey:string|null, baseUrl:string|null, models:string[]}}
|
||||
*/
|
||||
export function resolveJudgeConfig({ env = process.env, keychainGet = defaultKeychainGet } = {}) {
|
||||
let keychainKey = null;
|
||||
try {
|
||||
const v = keychainGet();
|
||||
keychainKey = v ? String(v) : null;
|
||||
} catch {
|
||||
keychainKey = null;
|
||||
}
|
||||
const envKey = env[KEY_ENV] ? String(env[KEY_ENV]) : null;
|
||||
const apiKey = keychainKey || envKey || null;
|
||||
|
||||
const flagOn = isTruthyFlag(env[ENABLE_FLAG]);
|
||||
const enabled = flagOn && apiKey !== null;
|
||||
|
||||
return {
|
||||
enabled,
|
||||
apiKey,
|
||||
baseUrl: env[BASE_URL_ENV] ? String(env[BASE_URL_ENV]) : null,
|
||||
models: JUDGE_MODELS.multi,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Default OS-keychain reader. Lazily loads `keytar`; returns null if keytar is
|
||||
* absent or the entry is missing. Never throws (caller also guards).
|
||||
*/
|
||||
export function defaultKeychainGet() {
|
||||
try {
|
||||
// Lazy require keeps the native dep optional — tests inject keychainGet and
|
||||
// never hit this path; the no-op posture means missing keytar => no key.
|
||||
const require = createRequire(import.meta.url);
|
||||
const keytar = require('keytar');
|
||||
const v = keytar.getPassword ? keytar.getPasswordSync?.(KEYCHAIN_SERVICE, KEYCHAIN_ACCOUNT) : null;
|
||||
return v || null;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
import { createRequire } from 'node:module';
|
||||
|
||||
export const _internals = { ENABLE_FLAG, KEY_ENV, BASE_URL_ENV, KEYCHAIN_SERVICE, KEYCHAIN_ACCOUNT, isTruthyFlag };
|
||||
@@ -0,0 +1,75 @@
|
||||
// tools/llm-judge-config.test.mjs
|
||||
// Router-gate v4 Layer 4 enabling-gate. The judge is OFF by default and only
|
||||
// becomes enabled when BOTH an explicit flag is set AND a key is resolvable.
|
||||
// Building this switch does NOT flip it — no key + no flag => disabled.
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { resolveJudgeConfig } from './llm-judge-config.mjs';
|
||||
|
||||
describe('llm-judge-config resolveJudgeConfig()', () => {
|
||||
it('is DISABLED by default: no flag, no key', () => {
|
||||
const c = resolveJudgeConfig({ env: {}, keychainGet: () => null });
|
||||
expect(c.enabled).toBe(false);
|
||||
expect(c.apiKey).toBe(null);
|
||||
});
|
||||
|
||||
it('stays DISABLED when a key exists but the enable flag is not set', () => {
|
||||
const c = resolveJudgeConfig({ env: {}, keychainGet: () => 'sk-test' });
|
||||
expect(c.enabled).toBe(false);
|
||||
expect(c.apiKey).toBe('sk-test');
|
||||
});
|
||||
|
||||
it('stays DISABLED when the flag is set but no key is resolvable', () => {
|
||||
const c = resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: '1' }, keychainGet: () => null });
|
||||
expect(c.enabled).toBe(false);
|
||||
expect(c.apiKey).toBe(null);
|
||||
});
|
||||
|
||||
it('is ENABLED only when the flag is set AND a key is resolvable (from keychain)', () => {
|
||||
const c = resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: '1' }, keychainGet: () => 'sk-keychain' });
|
||||
expect(c.enabled).toBe(true);
|
||||
expect(c.apiKey).toBe('sk-keychain');
|
||||
});
|
||||
|
||||
it('prefers the keychain key over the env fallback', () => {
|
||||
const c = resolveJudgeConfig({
|
||||
env: { ROUTER_LLM_JUDGE_ENABLED: '1', ROUTER_LLM_KEY: 'sk-env' },
|
||||
keychainGet: () => 'sk-keychain',
|
||||
});
|
||||
expect(c.apiKey).toBe('sk-keychain');
|
||||
});
|
||||
|
||||
it('falls back to the env key when the keychain is empty', () => {
|
||||
const c = resolveJudgeConfig({
|
||||
env: { ROUTER_LLM_JUDGE_ENABLED: '1', ROUTER_LLM_KEY: 'sk-env' },
|
||||
keychainGet: () => null,
|
||||
});
|
||||
expect(c.enabled).toBe(true);
|
||||
expect(c.apiKey).toBe('sk-env');
|
||||
});
|
||||
|
||||
it('accepts "true" (case-insensitive) as the enable flag', () => {
|
||||
const c = resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: 'TRUE' }, keychainGet: () => 'k' });
|
||||
expect(c.enabled).toBe(true);
|
||||
});
|
||||
|
||||
it('treats an arbitrary flag value (e.g. "0", "no") as NOT enabled', () => {
|
||||
expect(resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: '0' }, keychainGet: () => 'k' }).enabled).toBe(false);
|
||||
expect(resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: 'no' }, keychainGet: () => 'k' }).enabled).toBe(false);
|
||||
});
|
||||
|
||||
it('exposes default models and passes through baseUrl from env', () => {
|
||||
const c = resolveJudgeConfig({
|
||||
env: { ROUTER_LLM_JUDGE_ENABLED: '1', ROUTER_LLM_BASE_URL: 'https://example/api' },
|
||||
keychainGet: () => 'k',
|
||||
});
|
||||
expect(Array.isArray(c.models)).toBe(true);
|
||||
expect(c.models.length).toBeGreaterThan(0);
|
||||
expect(c.baseUrl).toBe('https://example/api');
|
||||
});
|
||||
|
||||
it('never throws when keychainGet itself throws — degrades to no key, disabled', () => {
|
||||
const c = resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: '1' }, keychainGet: () => { throw new Error('keychain locked'); } });
|
||||
expect(c.enabled).toBe(false);
|
||||
expect(c.apiKey).toBe(null);
|
||||
});
|
||||
});
|
||||
@@ -68,14 +68,43 @@ import { homedir } from 'node:os';
|
||||
import { readStdin, parseEventJson, exitDecision } from './enforce-hook-helpers.mjs';
|
||||
import { llmJudgeCall, readJudgeBudget, bumpJudgeBudget, JUDGE_SESSION_BUDGET } from './llm-judge.mjs';
|
||||
|
||||
// Calibration 1 (2026-05-31) — `Skill` removed from judge scope (SCOPE fix, NOT
|
||||
// a discipline drop). Invoking a Skill mutates no state; it is the prescribed
|
||||
// §17 entry into work. Judging the skill-invocation itself and blocking on
|
||||
// doubt directly contradicts §17 (which mandates skills). The real mutations a
|
||||
// skill leads to (Edit/Write/MultiEdit/Bash/PowerShell/commit/push/Task) remain
|
||||
// fully judged below — doubt→block on those is unchanged.
|
||||
export const MUTATING_TOOLS = new Set([
|
||||
'Edit', 'Write', 'MultiEdit', 'NotebookEdit', 'Bash', 'PowerShell', 'Skill', 'Task', 'Workflow',
|
||||
'Edit', 'Write', 'MultiEdit', 'NotebookEdit', 'Bash', 'PowerShell', 'Task', 'Workflow',
|
||||
]);
|
||||
|
||||
function runtimeDir(override) {
|
||||
return override || join(homedir(), '.claude', 'runtime');
|
||||
}
|
||||
|
||||
/**
|
||||
* Calibration 4 (soft, 2026-05-31): the classifier's distilled task summary is
|
||||
* lossy and sometimes "(unknown)" even for a perfectly clear user request,
|
||||
* which made the judge block all real edits (no task to compare → doubt→block).
|
||||
* When the summary is unknown/empty, fall back to judging against the user's
|
||||
* actual last prompt — the ground-truth request — instead of nothing.
|
||||
*
|
||||
* This is NOT calibration 2 (which would blindly ALLOW on unknown). The judge
|
||||
* still runs and still blocks on doubt; it just uses better evidence. When both
|
||||
* the summary and the user prompt are unavailable, the task stays "(unknown)"
|
||||
* and doubt→block is preserved.
|
||||
*/
|
||||
export function resolveEffectiveTask(declaredTask, lastUserPrompt) {
|
||||
const dt = declaredTask || {};
|
||||
const summary = dt.task_summary;
|
||||
const summaryUnknown = !summary || summary === '(unknown)' || !String(summary).trim();
|
||||
const prompt = typeof lastUserPrompt === 'string' ? lastUserPrompt.trim() : '';
|
||||
if (summaryUnknown && prompt) {
|
||||
return { ...dt, task_summary: prompt, task_source: 'user_prompt_fallback' };
|
||||
}
|
||||
return dt;
|
||||
}
|
||||
|
||||
/** Read the classifier-written declared task for this session; stub on miss. */
|
||||
export function readDeclaredTask({ sessionId, runtimeDirOverride }) {
|
||||
const path = join(runtimeDir(runtimeDirOverride), `router-state-${sessionId || 'unknown'}.json`);
|
||||
|
||||
@@ -69,6 +69,38 @@ describe('judgePerTool', () => {
|
||||
});
|
||||
});
|
||||
|
||||
import { resolveEffectiveTask } from './llm-judge-per-tool.mjs';
|
||||
|
||||
// Calibration 4 (soft, 2026-05-31) — when the classifier wrote "(unknown)" as
|
||||
// the declared task (its summary is lossy/unreliable), fall back to judging
|
||||
// against the user's actual last prompt instead of an empty task. NOT
|
||||
// calibration 2: the judge still blocks on doubt — it just uses better
|
||||
// evidence (the literal user request) when the classifier summary is empty.
|
||||
describe('resolveEffectiveTask — calibration 4 user-prompt fallback', () => {
|
||||
it('keeps the classifier summary when it is meaningful', () => {
|
||||
const r = resolveEffectiveTask({ task_summary: 'implement parallel-session-lock', recommended_node: '#19' }, 'some prompt');
|
||||
expect(r.task_summary).toBe('implement parallel-session-lock');
|
||||
expect(r.task_source).toBeUndefined();
|
||||
});
|
||||
|
||||
it('falls back to the user prompt when summary is "(unknown)"', () => {
|
||||
const r = resolveEffectiveTask({ task_summary: '(unknown)', recommended_node: null }, 'реализуй живой main для parallel-session-lock');
|
||||
expect(r.task_summary).toBe('реализуй живой main для parallel-session-lock');
|
||||
expect(r.task_source).toBe('user_prompt_fallback');
|
||||
});
|
||||
|
||||
it('falls back when summary is empty or blank', () => {
|
||||
expect(resolveEffectiveTask({ task_summary: '' }, 'do X').task_summary).toBe('do X');
|
||||
expect(resolveEffectiveTask({ task_summary: ' ' }, 'do X').task_summary).toBe('do X');
|
||||
});
|
||||
|
||||
it('stays unknown when both summary and user prompt are unavailable (still blocks on doubt)', () => {
|
||||
const r = resolveEffectiveTask({ task_summary: '(unknown)' }, '');
|
||||
expect(r.task_summary).toBe('(unknown)');
|
||||
expect(r.task_source).toBeUndefined();
|
||||
});
|
||||
});
|
||||
|
||||
import { MUTATING_TOOLS, readDeclaredTask } from './llm-judge-per-tool.mjs';
|
||||
|
||||
describe('per-tool helpers', () => {
|
||||
@@ -79,6 +111,16 @@ describe('per-tool helpers', () => {
|
||||
expect(MUTATING_TOOLS.has('Read')).toBe(false);
|
||||
});
|
||||
|
||||
// Calibration 1 (2026-05-31) — SCOPE fix, discipline NOT lowered.
|
||||
// Invoking a Skill changes no state; it is the prescribed §17 entry into
|
||||
// work. Judging the skill-invocation itself and blocking on doubt directly
|
||||
// contradicts §17 (which mandates skills). The real mutations a skill leads
|
||||
// to (Edit/Write/Bash/commit/push) stay fully judged, so removing Skill from
|
||||
// the judge scope does not lower discipline.
|
||||
it('does NOT treat Skill as mutating (calibration 1 — prescribed §17 entry, mutates nothing)', () => {
|
||||
expect(MUTATING_TOOLS.has('Skill')).toBe(false);
|
||||
});
|
||||
|
||||
it('readDeclaredTask falls back to a stub when state missing', () => {
|
||||
const dt = readDeclaredTask({ sessionId: 'no-such-session', runtimeDirOverride: '/nonexistent' });
|
||||
expect(dt).toHaveProperty('task_summary');
|
||||
|
||||
@@ -24,7 +24,7 @@ export function computeWorkspaceHash(workspacePath) {
|
||||
return createHash('md5').update(String(workspacePath || ''), 'utf-8').digest('hex').slice(0, 12);
|
||||
}
|
||||
|
||||
function isStale(record, now) {
|
||||
export function isStale(record, now) {
|
||||
if (!record || typeof record !== 'object') return true;
|
||||
const ttl = typeof record.ttl_ms === 'number' ? record.ttl_ms : LOCK_DEFAULT_TTL_MS;
|
||||
return now - (record.acquired_at || 0) > ttl;
|
||||
|
||||
@@ -6,6 +6,7 @@ import {
|
||||
release,
|
||||
refresh,
|
||||
computeWorkspaceHash,
|
||||
isStale,
|
||||
LOCK_DEFAULT_TTL_MS,
|
||||
} from './parallel-session-lock.mjs';
|
||||
|
||||
@@ -91,6 +92,26 @@ describe('parallel-session-lock pure module (Stream H Task 7)', () => {
|
||||
});
|
||||
});
|
||||
|
||||
// isStale is exported (B, 2026-05-31) so the wrapper's prune step reuses the
|
||||
// EXACT same staleness definition — single source of truth, no divergence that
|
||||
// could ever prune a still-fresh (active) lock.
|
||||
describe('isStale (exported for prune support)', () => {
|
||||
it('true when now - acquired_at exceeds ttl_ms', () => {
|
||||
expect(isStale({ acquired_at: 0, ttl_ms: 100 }, 1000)).toBe(true);
|
||||
});
|
||||
it('false when still within ttl (active lock — never pruned)', () => {
|
||||
expect(isStale({ acquired_at: 900, ttl_ms: 1000 }, 1000)).toBe(false);
|
||||
});
|
||||
it('true for a malformed/missing record', () => {
|
||||
expect(isStale(null, 1000)).toBe(true);
|
||||
expect(isStale(undefined, 1000)).toBe(true);
|
||||
});
|
||||
it('uses the default TTL when ttl_ms is absent', () => {
|
||||
expect(isStale({ acquired_at: 0 }, LOCK_DEFAULT_TTL_MS + 1)).toBe(true);
|
||||
expect(isStale({ acquired_at: 0 }, LOCK_DEFAULT_TTL_MS - 1)).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('computeWorkspaceHash (Stream H Task 7)', () => {
|
||||
it('returns 12 hex chars', () => {
|
||||
const h = computeWorkspaceHash('/some/path');
|
||||
|
||||
@@ -40,6 +40,25 @@ export const DEFAULT_PROTECTED_PATTERNS = [
|
||||
/(^|\/)\.npmrc$/i,
|
||||
];
|
||||
|
||||
// Read-tool deny list — narrower than DEFAULT_PROTECTED_PATTERNS (over-block fix 2026-05-31).
|
||||
// Smoke 5 reused the full protected-list for the Read tool, which blocked Read of
|
||||
// CLAUDE.md, the normative docs and the memory/ index — breaking the legit
|
||||
// claude-md-management / memory-sync workflow (harness Edit requires a prior Read).
|
||||
// Read of those files has NO exfil value: CLAUDE.md / Pravila / PSR / Tooling are
|
||||
// public-in-repo, memory/ is the controller's own index. The genuine Read-exfil
|
||||
// targets are cross-session transcripts (.jsonl), runtime side-channels, settings
|
||||
// and secrets — those stay blocked here. The Bash/PowerShell read gate (cat /
|
||||
// Get-Content) and the Write gate keep using the full DEFAULT_PROTECTED_PATTERNS,
|
||||
// so CLAUDE.md / memory remain protected against shell-read and overwrite.
|
||||
// NB: `.claude/projects/.*\.jsonl$` matches transcripts but NOT the `memory/`
|
||||
// subdirectory (memory files are *.md), so MEMORY.md stays readable.
|
||||
export const READ_DENY_PATTERNS = [
|
||||
/(^|\/)\.claude\/projects\/.*\.jsonl$/i, // cross-session transcripts (parent-context exfil)
|
||||
/(^|\/)\.claude\/runtime(\/|$)/i, // runtime side-channels (approve files, sentinels, state)
|
||||
/(^|\/)\.claude\/settings(\.local)?\.json$/i, // harness/hook config
|
||||
/(^|\/)\.env(\.|$)/i, // secrets
|
||||
];
|
||||
|
||||
export function isProtectedPath(p, pathNormalize = defaultPathNormalize, patterns = DEFAULT_PROTECTED_PATTERNS) {
|
||||
const n = pathNormalize(p);
|
||||
if (!n) return false;
|
||||
|
||||
@@ -242,3 +242,43 @@ describe('isProtectedPath — runtime dir without trailing slash (review fix)',
|
||||
expect(isProtectedPath('~/.claude/runtime/x.json', defaultPathNormalize, DEFAULT_PROTECTED_PATTERNS)).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
import { READ_DENY_PATTERNS } from './shell-content-rules.mjs';
|
||||
|
||||
// Over-block fix (2026-05-31): the Read tool needs a NARROWER deny list than the
|
||||
// Bash/PowerShell/Write gate. Read of CLAUDE.md / Pravila / memory has no exfil
|
||||
// value (public-in-repo / own memory index); the genuine Read-exfil targets are
|
||||
// cross-session transcripts (.jsonl), runtime side-channels, settings, secrets.
|
||||
describe('READ_DENY_PATTERNS (narrow Read-tool deny)', () => {
|
||||
it.each([
|
||||
'~/.claude/projects/abc/session.jsonl',
|
||||
'/c/Users/Administrator/.claude/projects/crm/x.jsonl',
|
||||
'~/.claude/runtime/router-state.json',
|
||||
'~/.claude/runtime',
|
||||
'~/.claude/settings.json',
|
||||
'~/.claude/settings.local.json',
|
||||
'.env',
|
||||
'app/.env.production',
|
||||
])('Read-denies genuine exfil target %s', (p) => {
|
||||
expect(isProtectedPath(p, defaultPathNormalize, READ_DENY_PATTERNS)).toBe(true);
|
||||
});
|
||||
|
||||
it.each([
|
||||
'CLAUDE.md',
|
||||
'/c/моя/проекты/портал crm/Документация/CLAUDE.md',
|
||||
'/c/Users/Administrator/.claude/projects/crm/memory/MEMORY.md',
|
||||
'/c/Users/Administrator/.claude/projects/crm/memory/feedback_x.md',
|
||||
'docs/Pravila_raboty_Claude_v1_1.md',
|
||||
'docs/Plugin_stack_rules_v1.md',
|
||||
'docs/Tooling_v8_3.md',
|
||||
'node_modules/shell-quote/index.js',
|
||||
])('does NOT Read-deny public/normative/memory file %s', (p) => {
|
||||
expect(isProtectedPath(p, defaultPathNormalize, READ_DENY_PATTERNS)).toBe(false);
|
||||
});
|
||||
|
||||
it('DEFAULT_PROTECTED_PATTERNS still protects CLAUDE.md/Pravila/memory (Bash/PowerShell/Write gates unchanged)', () => {
|
||||
expect(isProtectedPath('CLAUDE.md', defaultPathNormalize, DEFAULT_PROTECTED_PATTERNS)).toBe(true);
|
||||
expect(isProtectedPath('docs/Pravila_raboty_Claude_v1_1.md', defaultPathNormalize, DEFAULT_PROTECTED_PATTERNS)).toBe(true);
|
||||
expect(isProtectedPath('memory/feedback.md', defaultPathNormalize, DEFAULT_PROTECTED_PATTERNS)).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user