Compare commits

...

33 Commits

Author SHA1 Message Date
Дмитрий 6ce2f0058d fix(router-gate): session-lock skips readonly Bash (scope calibration)
The parallel-session-lock fired on every PreToolUse tool, blocking even
readonly Bash (git status/log/diff, cat, grep, ls) from a peer session.
The lock's purpose is to serialize concurrent FILE MUTATION on the same
worktree; readonly commands mutate nothing, so they are outside that scope.

isReadonlyBashEvent() reuses the router-gate Bash classifier (an allow-verdict
whose reason is readonly/reading), mirroring the LLM-judge readonly
calibration. main() short-circuits readonly Bash to allow without
acquiring/blocking. Mutating tools, git commit/push, dangerous Bash, and
every non-Bash tool still acquire/check the lock — same-worktree mutation
serialization is unchanged (scope fix, NOT a discipline drop).

TDD: +6 unit tests. Full tools-vitest 2038 passed / 2 skipped.
2026-06-01 07:46:26 +03:00
Дмитрий d35fefddd9 ci(a11y): bump Pa11y workflow Node 20 -> 22 (cspell@10 engine requirement)
The a11y (Pa11y live) PR check failed at "Install root JS deps": root `npm ci`
hits EBADENGINE because @cspell/cspell-*@10.0.0 require Node >=22.18.0 while the
workflow pinned Node 20. Pre-existing mismatch (cspell ^10 predates this branch
and fails identically on main), unrelated to the discipline-guard hook changes.
Node 22 satisfies both the repo engines (>=20) and cspell (>=22.18).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 19:00:05 +03:00
Дмитрий e56ddd6a1b fix(router-gate): coverage line honors cross-turn active skill (verify + remind)
Backlog item G. The `coverage:` line under-reported a skill chosen in a PRIOR turn:
enforce-coverage-verify credited channel=skill only if the Skill tool ran in the
CURRENT turn, so an honest `skill:X` continuation line was BLOCKED -> the controller
learned to under-report as direct/chain. Two-sided systemic fix, no weakening:

- enforce-coverage-verify: decide() also accepts skill:X when X was invoked anywhere
  earlier in THIS session (new priorSkillNames param; main() collects them via
  sessionToolUses). Still unforgeable -- a real Skill tool_use must exist in the
  transcript. The only residual is possibly-stale attribution, far better than the
  forced dishonest direct-reporting it replaces.
- enforce-prompt-injection: the §17 reminder now lists active skills carried over
  from earlier turns (read from the transcript) and tells the controller to report
  `coverage: skill:<name>` when work continues under one -- the proactive half, so
  the correct line is not merely allowed but prompted.

TDD: RED -> GREEN per behavior. tools-vitest 2032 passed / 2 skipped.
Plan docs/superpowers/plans/2026-05-31-discipline-guard-backlog.md (item G).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 18:37:44 +03:00
Дмитрий 53407a77cd feat(router-gate): tdd-gate credits delegated (subagent) TDD + transcript write-deny
Closes the TDD-gate cross-actor gap: when a subagent (spawned by a Task in the
controller's current turn) writes the failing test and confirms RED, the
controller's subsequent production edit was falsely blocked because the gate only
scanned the controller's own turn. Net strengthening, no discipline weakened.

- Part 1 (enforce-runtime-write-deny): block the Write tool from any
  ~/.claude/projects/**/*.jsonl (session/subagent transcripts). Memory *.md there
  stays writable (never matches .jsonl$). Resolving normalizer defeats ./.. evasion.
  This makes the agent-<id>.jsonl that Part 2 trusts unforgeable (it was the last
  ungated write channel; Bash/PowerShell/Read gates already covered it).
- Part 2 (enforce-tdd-gate): decide() also credits a subagent's matching test edit
  + RED via a new subagentEntriesList. turnTaskAgentIds() reads the hex agentId from
  the harness-written Task tool_result (the controller cannot forge its own
  tool_result; hex-only match blocks "agentId: ../../x" path traversal).
  subagentTranscriptPaths() derives <dir>/<controller-session>/subagents/agent-<id>.jsonl.
  main() reads them best-effort (missing/unreadable -> no extra credit = stricter).

No new weakening: a delegated subagent doing real TDD is legitimate; the only
forgery vector (overwriting the agent jsonl) is closed by Part 1. Existing
controller-turn behaviour is preserved (empty subagent list == old logic).

OWNER (settings.json, Claude can't edit it): enforce-tdd-gate is already a
registered PreToolUse hook -> Part 2 goes live on merge. enforce-runtime-write-deny
must be registered on PreToolUse(Edit|Write|MultiEdit|NotebookEdit) for Part 1 to be live.

TDD: RED -> GREEN per behavior. tools-vitest 2027 passed / 2 skipped.
Backlog item C (=Z); plan docs/superpowers/plans/2026-05-31-discipline-guard-backlog.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 18:18:44 +03:00
Дмитрий 6577c04a1f fix(router-gate): session-lock hygiene — clearer block message + stale-lock prune
Closes the remaining parallel-session-lock remarks on top of the keying fix
(7a469dc9), with NO weakening of same-worktree serialization:

- D: the block message now identifies the holder by its STABLE session_id and
  marks the recorded pid as transient ("may change between attempts"). Chasing
  the pid is what led to closing the wrong session. Decision logic is unchanged
  (text only) — existing /pid N/ triage assertion still holds.
- B: pruneStaleLocks() best-effort deletes leaked lock files that are ALREADY
  stale by the shared isStale() definition (now exported from the pure module —
  single source of truth). Active within-TTL locks are never touched, so the
  serialization guarantee is not weakened. Wired into the PreToolUse branch of
  main(), wrapped so hygiene can never break the gate (fail-open).
- C (no code): release-on-SessionEnd needs only a settings.json registration
  (owner action) — the existing !tool_name branch already releases. Documented
  in the plan. Until then, leaked locks self-heal via B + the 5-min TTL takeover.

TDD: RED -> GREEN per behavior. tools-vitest 2014 passed / 2 skipped.
Backlog items B/C/D; plan docs/superpowers/plans/2026-05-31-discipline-guard-backlog.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 17:43:03 +03:00
Дмитрий 7a469dc913 fix(router-gate): key session-lock by session work-tree root, not hook cwd
enforce-parallel-session-lock keyed the lock on the hook's process.cwd(),
which collapses to the main repo dir after a session resume — so sessions in
DIFFERENT git worktrees shared one lock and false-blocked each other (observed:
a brainrepo-worktree session blocked launching agents by a discipline-guard
session). New resolveWorkspacePath() keys on the session's stable cwd
(event.cwd) resolved to the git work-tree root (git -C <cwd> rev-parse
--show-toplevel), with fallback to process.cwd() so behaviour never regresses
when event.cwd is absent. Same-worktree concurrency stays serialized
(unchanged) — discipline not weakened; only cross-worktree false-blocks fixed.

TDD: RED (5 resolveWorkspacePath cases) -> GREEN -> tools-vitest 2003 passed /
2 skipped. Backlog item F; plan
docs/superpowers/plans/2026-05-31-discipline-guard-backlog.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 17:02:32 +03:00
Дмитрий be4e1a6123 feat(router-gate): whitelist npm ci in SAFE_EXACT (worktree dep restore)
`npm ci` does a clean install strictly from the committed lockfile
(deterministic, no version drift) — needed to restore junction node_modules
in a fresh worktree. Distinct from `npm install`/`npm i`, which stay
hard-blacklisted because they can pull new/updated versions; the blacklist
runs before the whitelist, so they remain blocked. Word boundary after `ci`
prevents `npm cider`-style prefix matches; chain semantics still block
`npm ci && <mutating>`.

TDD: RED (3 allow-cases failed default-deny) -> GREEN (/^npm\s+ci\b/) ->
tools-vitest 1998 passed / 2 skipped (2000). Backlog item A; plan
docs/superpowers/plans/2026-05-31-discipline-guard-backlog.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 14:46:58 +03:00
Дмитрий b0cd18d797 fix(router-gate): quote-aware redirect detector + drop dead override-phrase ads
Квирк 2: новый stripQuotedSpans делает детектор stdout/stderr-редиректа
кавычко-осознанным — `>` / `2>` ВНУТРИ кавыченного аргумента (текст коммита
с <email>, "2>1") больше не ложно-блокируется; настоящие редиректы (оператор
вне кавычек) блокируются как прежде. RED→GREEN, существующие redirect/cd-app
кейсы целы.

1A: убрана реклама мёртвых override-фраз (findOverride — заглушка v4, фразы
не работают): баннер enforce-prompt-injection (каждый UserPromptSubmit) +
block-сообщения enforce-verify-before-push / coverage-verify / memory-coverage
/ tdd-gate (×3). Каждый фикс залочен негативным тестом.

Сознательно НЕ делали: калибровку 6 судьи (читать чат-контекст) и ослабление
exact-match approve (квирк 3) — это рубежи защиты, их трогать нельзя.

Регрессия vitest tools-only: 1989 passed | 2 skipped (verify через
npx vitest run --root app --config vitest.config.tools.mjs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 14:05:52 +03:00
Дмитрий 30b79c7228 fix(router-gate): narrow cd app whitelist (TDD, tools 1978 GREEN)
Add /^cd\s+app$/ to SAFE_EXACT so already-whitelisted commands (pest,
php artisan test) run from app/. Scope limited to the literal `app` dir:
cd into any other path (incl. protected .claude/runtime, memory/,
transcripts) stays default-deny, so the cwd-shift read-bypass is contained.
Mutations remain caught at the hard-blacklist + chain-mutating rule, and
each chain segment after `cd app &&` must still be independently whitelisted.

Owner-authorized, narrow scope = literal `app` only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:34:42 +03:00
Дмитрий 63100decce chore(mcp): disable marketing MCP servers (metrika/wordstat/telegram)
Свёрнуты в _disabled note (restorable via git + рецепт восстановления в файле).
Маркетинговые серверы из github:-исходников с авто-генерируемыми схемами
(wordstat — 128 tools из Яндекс.Директа) — главный подозреваемый в API 400
tools.110/113, ронявшем субагентов при bulk-load всех инструментов
(subagent-driven-development). Off-phase, без OAuth-токенов не стартовали —
потерь для текущей работы нет.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 12:26:55 +03:00
Дмитрий f6421fd61c docs(router-gate-v4): calibration 5 plan - cosmetic-detector git-approval exemption 2026-05-31 11:39:20 +03:00
Дмитрий d647bf1858 fix(router-gate-v4): calibration 5 - cosmetic-detector exempts git-approval AskUser (scope fix, regression-tested) 2026-05-31 11:19:14 +03:00
Дмитрий 1f9b51bc39 feat(router-gate-v4): parallel-session-lock live main() — acquire on PreToolUse + release on Stop (point 2)
The Stream H wrapper shipped a deliberate no-op main() — the lock did nothing.
This wires it live: PreToolUse on a mutating tool acquires/refreshes the
workspace lock (blocks only when a DIFFERENT session holds a fresh, non-stale
lock); the Stop event releases it. Fail-open on any error so a lock bug can
never wedge the user out of their own session.

- runAcquireDecision({event,now,pid,cwd,readLock,writeLock}) — compose
  acquire() + decide().
- runReleaseAction({event,cwd,readLock,deleteLock}) — release() if this
  session owns the lock, no-op otherwise.
- live main(): branches on tool_name (present → acquire/refresh; absent/Stop
  → release); real fs binding via runtimeDir()/session-lock-<workspaceHash>.json.

Activation registers BOTH the PreToolUse (acquire) AND the Stop (release)
entries — the Stop wiring is mandatory; without it the lock is never released
and the next abnormal exit would lock the user out. Script:
.scratch/activate-point2-hooks.ps1 (also registers safe-baseline-metering +
runtime-write-deny per the point-2 plan).

Plan: docs/superpowers/plans/2026-05-30-router-gate-v4-stream-H.md Task 7.
Regression: parallel-session-lock 12/12 GREEN; full tools suite 1958 passed | 2 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 11:06:52 +03:00
Дмитрий 8a7144892c fix(router-gate-v4): calibrate per-tool LLM-judge — calibration 4 soft user-prompt fallback
The per-tool judge compares each mutating tool call against the classifier's
distilled task summary read from router-state. That summary is lossy and
frequently "(unknown)" even for a perfectly explicit user request — and with an
unknown task the judge has nothing to compare against, so "Сомнения → NO"
blocked every real edit. Reproduced repeatedly this session: an explicit
"реализуй ... main() ..." prompt still classified unknown → all edits blocked,
including the judge's own fix. Calibration 2 (allow on unknown) was rejected by
the owner as a discipline hole.

Calibration 4 (soft, scope-preserving): when — and only when — the classifier
summary is "(unknown)"/empty, fall back to judging against the user's actual
last prompt (the ground-truth request) instead of nothing. The judge still runs
and still blocks on doubt; it just uses better evidence. When the summary is
meaningful, behaviour is unchanged (the user-prompt reader is not consulted).
When both summary and prompt are unavailable, the task stays "(unknown)" and
doubt→block is preserved.

NOT calibration 2: this does not blindly allow on unknown — it re-grounds the
judge in the literal user request, which the controller cannot fabricate (the
user writes it; it is read locally from the session transcript).

- tools/llm-judge-per-tool.mjs: resolveEffectiveTask(declaredTask, lastUserPrompt).
- tools/enforce-llm-judge-per-tool.mjs: runPerTool reads the last user prompt
  (helpers.lastUserPromptText + readTranscript) only on an unknown summary;
  main() binds it.

Regression: judge tests 57/57 GREEN; full tools suite 1951 passed | 2 skipped.
The 6 remaining failures are uncommitted point-2 WIP in
enforce-parallel-session-lock.test.mjs — not part of this change, not committed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 10:34:27 +03:00
Дмитрий 722f4bb189 fix(router-gate-v4): calibrate per-tool LLM-judge — exempt Skill (calib 1) + test-runners (calib 3)
The Layer-4 per-tool judge over-blocked: it judged every Skill/Edit/Write/
Bash/Task against the declared task and blocked on doubt. A vague prompt
classifies as unknown/ambiguous, so the judge then blocked essentially all
artifact-producing tools — including the prescribed §17 skill entry and the
mandatory TDD test run — making legitimate, owner-mandated work impossible
and blocking its own fix (3 reproduced blocks this session).

Calibration 1 (scope fix, NOT a discipline drop): remove `Skill` from
MUTATING_TOOLS in tools/llm-judge-per-tool.mjs. Invoking a skill mutates no
state and is the §17-mandated entry into work; the real mutations it leads to
(Edit/Write/MultiEdit/Bash/PowerShell/Task/commit/push) stay fully judged.

Calibration 3 (scope fix, NOT a discipline drop): add isTestRunnerBashEvent to
tools/enforce-llm-judge-per-tool.mjs and skip it in runPerTool, mirroring the
existing readonly-Bash exemption. A test run (vitest/pest/phpunit/php artisan
test/composer test/npm test) only inspects + reports and is a mandatory TDD
step; commands chaining to a mutation (&& ; | backtick $() are NOT exempt.

doubt→block on real mutations against a known task is unchanged (covered by the
"mutating Bash (git commit) STILL judged" test). Calibration 2 (allow on
unknown task) was rejected by the owner as a discipline hole and not added.

Regression: vitest tools-only 1945 passed | 2 skipped (+18 calibration tests).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 10:04:43 +03:00
Дмитрий 417cfcbc37 docs(router-gate-v4): CLAUDE.md v2.44 — item 2b judge live + activated + readonly calibration 2026-05-31 09:04:09 +03:00
Дмитрий c9b9efd6e4 fix(router-gate-v4): exclude readonly Bash from per-tool judge — scope fix, discipline unchanged 2026-05-31 08:59:18 +03:00
Дмитрий dfae9f760b feat(router-gate-v4): live main() for LLM-judge wrappers — flag-gated spend (item 2b) 2026-05-31 08:06:26 +03:00
Дмитрий a8996896a8 test(router-gate-v4): Read-deny boundary cases (.env.production blocked, Tooling doc readable) 2026-05-31 07:38:18 +03:00
Дмитрий f82c878c60 docs(router-gate-v4): CLAUDE.md v2.43 — safe-baseline 1b + C3 + judge wrappers + Read-deny over-block fix 2026-05-31 07:29:58 +03:00
Дмитрий 3c5266c022 fix(router-gate-v4): narrow Read-deny so CLAUDE.md and memory are Read-allowed, transcripts/runtime still blocked (over-block fix) 2026-05-31 07:26:30 +03:00
Дмитрий 9280c48025 docs(router-gate-v4): remaining-holes checklist update + CLAUDE.md insertion draft (item 1b tails) 2026-05-31 07:04:27 +03:00
Дмитрий 84dcf4aab3 docs(router-gate-v4): safe-baseline spec v4 + plan + handoff (item 1b) 2026-05-31 05:58:13 +03:00
Дмитрий 80e514f5bb feat(router-gate-v4): enforce-runtime-write-deny protect runtime side-channels (C3) 2026-05-31 05:57:59 +03:00
Дмитрий f740f6124a feat(safe-baseline): live main() metering + hard-block + Skill/EnterPlanMode escape (item 1b) 2026-05-31 05:57:47 +03:00
Дмитрий c86fdfc9eb docs(router-gate-v4): safe-baseline spec v3 — fold 2nd adversarial review (V2-1/V2-2/V2-4) (item 1b) 2026-05-30 20:44:26 +03:00
Дмитрий 9f84d9ef09 docs(router-gate-v4): safe-baseline spec v2 — close C1/C2/C3/H1 from adversarial review (item 1b) 2026-05-30 20:31:23 +03:00
Дмитрий 6d512f5cf3 docs(router-gate-v4): safe-baseline live-wiring design spec (item 1b) 2026-05-30 20:12:39 +03:00
Дмитрий ca52d354f9 feat(router-gate-v4): LLM-judge per-tool + response-scan hook wrappers (Stream H tail) 2026-05-30 19:59:42 +03:00
Дмитрий c805988085 docs(observer): router-gate v4 remaining-holes checklist (Stream H follow-up) 2026-05-30 19:38:51 +03:00
Дмитрий 6ac4b1c1b1 feat(router-gate-v4): safe-baseline-metering wrapper + llm-judge-config gate (Stream H tail) 2026-05-30 19:29:58 +03:00
Дмитрий f172e2a580 feat(router-gate): SAFE_EXACT +Laravel dev workflow
Closes design gap in v4 whitelist: dev commands (pest, composer test/pint/stan/insights/rector,
php artisan test/migrate variants/db:seed/cache:clear etc., vendor/bin/pest) were falling into
default-deny. That blocked sessions working on app/ code and pushed controllers toward override
phrases or requests to disable the defense.

Changes are surgical and do not weaken discipline defense:
- 4 new SAFE_EXACT regex entries for specific dev commands
- tinker EXCLUDED on purpose (REPL = arbitrary PHP exec risk)
- migrate:install and other unknown migrate subcommands stay blocked via
  lookahead instead of word-boundary (precision fix)
- Hard-blacklist for mutating package operations, chain-semantics C13,
  file-watcher, TDD-gate, path-deny, coverage requirement and the other 15
  defense hooks are NOT touched.

TDD: 22 RED allow-tests + 7 still-block tests + 3 regression tests.
Full tools-only regression 1821/1821 GREEN.

Live smoke verified: composer test allowed; migrate:install blocked.

Whitelist v3.8 was sized around vitest tools-only; Laravel app/ dev workflow
slipped through. This commit corrects that without touching the architecture.
2026-05-30 16:11:34 +03:00
Дмитрий 4686b36571 docs(region): lead-region-resolution spec v0.5 + 6-session plan 2026-05-30 15:38:54 +03:00
48 changed files with 6024 additions and 105 deletions
+2 -2
View File
@@ -21,10 +21,10 @@ jobs:
extensions: pdo, pdo_pgsql, redis, mbstring, intl, bcmath
coverage: none
- name: Setup Node 20
- name: Setup Node 22
uses: actions/setup-node@v4
with:
node-version: '20'
node-version: '22'
cache: 'npm'
- name: Install root JS deps
+1 -26
View File
@@ -54,32 +54,7 @@
},
"comment": "A3 integration-tooling #47 — OpenAPI MCP (ivo-toby/mcp-openapi-server, @ivotoby/openapi-mcp-server v1.14.0, MIT). Exposes Лидерра REST API endpoints (docs/api/openapi.yaml) as MCP tools. Config via env-vars API_BASE_URL + OPENAPI_SPEC_PATH (stdio transport default). READ scope: API discovery/introspection for Claude Code. Формализован в Tooling §4.22, PSR_v1 R10.1 блок 3, Pravila §13.2."
},
"marketing-metrika": {
"command": "npx",
"args": ["-y", "github:atomkraft/yandex-metrika-mcp"],
"env": {
"YANDEX_OAUTH_TOKEN": "${YANDEX_OAUTH_TOKEN}"
},
"comment": "C1 marketing-tooling #78 — Yandex Metrika MCP (vetted source: github:atomkraft/yandex-metrika-mcp, MIT — выбран по IS9-вету из 3 кандидатов, см. docs/security/marketing-vet.md). READ-ONLY аналитика: посещаемость, источники трафика, конверсии. Env: YANDEX_OAUTH_TOKEN — OAuth-токен с правами read-only. Постура IS9: READ-ONLY, мутации API Метрики не задействуются. Tooling §4.53. docs/marketing/README.md."
},
"marketing-wordstat": {
"command": "npx",
"args": ["-y", "github:SvechaPVL/yandex-mcp"],
"env": {
"YANDEX_OAUTH_TOKEN": "${YANDEX_OAUTH_TOKEN}"
},
"comment": "C1 marketing-tooling #79 — Yandex Direct+Wordstat MCP (vetted source: github:SvechaPVL/yandex-mcp, MIT — выбран по IS9-вету, см. docs/security/marketing-vet.md). Репозиторий отдаёт 128 tools (Direct + Wordstat + Метрика); по IS9-условию используются ТОЛЬКО Wordstat-инструменты для подбора ключевых слов и оценки спроса — Direct-мутации (создание/правка кампаний, изменение ставок) поведенчески запрещены через marketing-ru #77 и MKT8 (никаких автоматических трат рекламного бюджета). Env: YANDEX_OAUTH_TOKEN с минимальным scope. Tooling §4.54. docs/marketing/README.md."
},
"marketing-telegram": {
"command": "npx",
"args": ["-y", "github:chigwell/telegram-mcp"],
"env": {
"TELEGRAM_API_ID": "${TELEGRAM_API_ID}",
"TELEGRAM_API_HASH": "${TELEGRAM_API_HASH}",
"TELEGRAM_SESSION_STRING": "${TELEGRAM_SESSION_STRING}"
},
"comment": "C1 marketing-tooling #80 — Telegram MCP (chigwell/telegram-mcp, Apache-2.0, GitHub-only — не npm). Работа с Telegram-каналами и чатами Лидерры: публикация, планирование, аналитика. Env: TELEGRAM_API_ID + TELEGRAM_API_HASH (получить на https://my.telegram.org/apps) + TELEGRAM_SESSION_STRING (генерируется один раз через GramJS/Telethon, хранить в .env.local gitignored). ОБЯЗАТЕЛЬНО: выделенный Telegram-аккаунт для Лидерры, не личный (IS9-постура MKT8). Tooling §4.51. docs/marketing/README.md."
},
"_disabled_marketing_servers_note": "ОТКЛЮЧЕНЫ 2026-05-31 (владелец: «отрежь маркетинг»). Причина: их авто-генерируемые схемы (особенно wordstat — 128 tools из Яндекс.Директа) — главный подозреваемый в API 400 tools.110/113, ронявшем субагентов при bulk-load всех инструментов (subagent-driven-development). Серверы off-phase и без OAuth-токенов всё равно не стартовали. Полный конфиг — в git до этого коммита. Чтобы вернуть, восстановить три блока mcpServers: marketing-metrika (npx -y github:atomkraft/yandex-metrika-mcp; env YANDEX_OAUTH_TOKEN; READ-ONLY; Tooling §4.53), marketing-wordstat (npx -y github:SvechaPVL/yandex-mcp; env YANDEX_OAUTH_TOKEN; ТОЛЬКО Wordstat per IS9/MKT8; Tooling §4.54), marketing-telegram (npx -y github:chigwell/telegram-mcp; env TELEGRAM_API_ID/API_HASH/SESSION_STRING; выделенный аккаунт IS9; Tooling §4.51). См. docs/security/marketing-vet.md и docs/marketing/README.md.",
"_comment_postiz_skeleton": "TODO: C1 marketing-tooling #81 — Postiz MCP (gitroomhq/postiz-app self-host + antoniolg/postiz-mcp). Активировать ПОСЛЕ: 1) развернуть Postiz self-hosted (git clone https://github.com/gitroomhq/postiz-app + docker-compose, AGPL-3.0: internal-only, no modifications); 2) провести vet лицензии antoniolg/postiz-mcp (NOT YET VERIFIED — см. docs/marketing/README.md Open vet notes); 3) подключить соцсети в Postiz UI. Будущий entry: \"marketing-postiz\": { \"command\": \"npx\", \"args\": [\"-y\", \"postiz-mcp\"], \"env\": { \"POSTIZ_API_URL\": \"${POSTIZ_API_URL}\", \"POSTIZ_API_KEY\": \"${POSTIZ_API_KEY}\" }, \"comment\": \"C1 #81 post-activation\" }. Tooling §4.52. docs/marketing/README.md."
}
}
+9 -1
View File
File diff suppressed because one or more lines are too long
+32 -26
View File
@@ -1,6 +1,6 @@
# Brain Status (auto-generated)
Last updated: 2026-05-30T03:11:28.244Z
Last updated: 2026-05-30T13:11:39.164Z
| Контролёр | Состояние | Детали |
|---|---|---|
@@ -8,14 +8,14 @@ Last updated: 2026-05-30T03:11:28.244Z
| C2 Cross-ref consistency | ✅ | [cross-ref-checker] OK — 0 drift in 4 files |
| C3 Observer-of-observer | ✅ | [observer-of-observer] OK — last read 0 week(s) ago |
| C4 Сигнальный статус | ✅ | This file (self-reference) |
| C5 Observer-coverage | ⚠️ | 639 episode(s) this month · Stop-hook + post-commit OK · 20 missed activation(s) — see /brain-retro |
| C5 Observer-coverage | ⚠️ | 752 episode(s) this month · Stop-hook + post-commit OK · 20 missed activation(s) — see /brain-retro |
| C6 Chain map sync | ✅ | [chain-map-checker] OK — 16 chains in sync |
## Метрики (информационные, не алерты)
- Observer evidence: 639 episodes this month, 0 observer_error markers, 129 PII matches before filter
- Legacy v1 episodes (not in factor analysis): 500
- Last /brain-retro: 3 day(s) ago
- Observer evidence: 752 episodes this month, 0 observer_error markers, 186 PII matches before filter
- Legacy v1 episodes (not in factor analysis): 613
- Last /brain-retro: 0 day(s) ago
- Использование узлов: см. `/brain-retro` (раз в спринт). missed_activations: 20. **Неиспользованные узлы — не алерт, если профильной задачи не было** (Pravila §16.4 v1.36; capability-readiness; см. memory `feedback_brain_unused_tools_not_problem` — outside-repo memory store).
## Метрики дисциплины
@@ -24,16 +24,16 @@ Baseline дисциплины роутера (этап 2 router discipline overh
| Тип задачи | Эпизодов | % с триггер-матчем | % через скил |
|---|---|---|---|
| analysis | 26 | 30.8% | 15.4% |
| bugfix | 19 | 26.3% | 26.3% |
| planning | 16 | 18.8% | 18.8% |
| feature | 15 | 13.3% | 0.0% |
| analysis | 34 | 23.5% | 14.7% |
| planning | 25 | 12.0% | 16.0% |
| bugfix | 25 | 24.0% | 20.0% |
| feature | 19 | 10.5% | 0.0% |
| cleanup | 6 | 0.0% | 0.0% |
| refactor | 1 | 0.0% | 0.0% |
Router step distribution: 1: 281, 2: 227, 3: 63, 5: 61
Router step distribution: 1: 330, 2: 279, 3: 67, 5: 67
Boundaries applied (ADR / границы): 72 of 632 эпизодов (11.4%).
Boundaries applied (ADR / границы): 76 of 743 эпизодов (10.2%).
## Активные многоэтапные проекты
@@ -45,16 +45,22 @@ Boundaries applied (ADR / границы): 72 of 632 эпизодов (11.4%).
## Длинные сессии
Ни одной сессии с >50 ходов сегодня (UTC). ✅
⚠️ Сегодня (2026-05-30 UTC) есть сессии с 50 ходов — корреляция с падением дисциплины роутинга (retro #5 candidate B).
| session_id | макс. ход | % regulated | последний эпизод |
|---|---|---|---|
| `52b2b52d` | 75 | 3% | 2026-05-30T11:45:39.213Z |
Long sessions correlate with discipline drift. Если % regulated просел в текущей сессии — рассмотри перезапуск.
## Стоимость месяца
| Компонент | Токены (in/out) | USD |
|---|---|---|
| Classifier (Sonnet 4.6) | 3237/42293 | $0.64 |
| Classifier (Sonnet 4.6) | 12550/86494 | $1.34 |
| Self-assessment (Sonnet 4.6) | 0/0 | $0.00 |
| Reviewer (Opus 4.7 + fallback) | 0/0 | $0.00 |
| **Итого** | | **$0.64** |
| **Итого** | | **$1.34** |
## Аномалии классификатора
@@ -67,40 +73,40 @@ Episodes since last run: 542 / threshold: 10
## Reviewer: субагент vs fallback
0 эпизодов проверено из 639.
0 эпизодов проверено из 752.
## Reviewer findings
Проверено: 339 эпизодов. **51 actionable** (wrong_skill + wrong_chain_order).
Проверено: 372 эпизодов. **69 actionable** (wrong_skill + wrong_chain_order).
### error_root_cause
| cause | count |
|---|---:|
| n/a | 261 |
| wrong_skill | 41 |
| external_failure | 23 |
| wrong_chain_order | 10 |
| n/a | 271 |
| wrong_skill | 55 |
| external_failure | 28 |
| wrong_chain_order | 14 |
| wrong_tool | 4 |
### Топ alternative_better
| recommended | count |
|---|---:|
| #19 | 16 |
| #19 | 18 |
| #25 | 15 |
| #34 | 8 |
| #18 | 6 |
| #18 | 8 |
| #33 | 3 |
### node_quality
| judgment | count |
|---|---:|
| disputable | 191 |
| correct | 113 |
| wrong_node | 31 |
| underkill | 2 |
| disputable | 207 |
| correct | 120 |
| wrong_node | 40 |
| underkill | 3 |
| overkill | 2 |
## Использование override-фраз
@@ -0,0 +1,94 @@
# Router-gate v4 — оставшиеся дыры (чек-лист «на потом»)
**Дата:** 2026-05-30
**Контекст:** после закрытия нестыковки №1 (убраны 2 лишние записи судьи из `.claude/settings.json`).
**Статус системы:** Layers 13 работают; Layer 4 (судья) построен как движок + добавлен config-выключатель (DEFAULT OFF); нигде не прописан и без ключа → реально выключен. Владелец 30.05 выбрал курс «включать», но активация (ключ + флаг + хуки) — отдельный его шаг.
> Делать в **чистой сессии**: без параллельных Claude-сессий и НЕ в изолированной копии (worktree).
> Многое упирается в файл `.claude/settings.json` — Claude'у его Read/Edit заблокированы собственной защитой, нужна ручная правка владельцем.
---
## Приоритет 1 — обёртка написана (TDD), подключение отложено
### [x] 1a. Обёртка `enforce-safe-baseline-metering.mjs` — СДЕЛАНО (30.05, worktree h-close)
- **Что сделано:** обёртка с чистой функцией `decide()` (инкремент per-task счётчика + оценка порогов через `incrementCounter`/`evaluateThresholds`) + функция границ задачи `processEvent()` (см. 1b) + 14 тестов. TDD: тест первым, RED подтверждён в том же ходе, GREEN 14/14.
- **Шаблон:** как соседние обёртки Stream H (`enforce-decomposition-detector.mjs`) — `main()` намеренно no-op (exit 0), без живого подключения и без self-lockout.
- **NB по среде:** TDD-сторож сверяет правки по основной папке и не видит правки в worktree → ложно блокирует; фразы-исключения в v4 отключены (universal vocab removal, `findOverride`→null), текст «Override: …» в сообщении хука устарел. Цикл RED→GREEN нужно делать в ОДНОМ ходе (правка теста + красный прогон + запись реализации), тогда сторож засчитывает.
### [x] 1b. Живое подключение `safe-baseline` — СДЕЛАНО (31.05, commits `f740f612` + `80e514f5` + `84dcf4aa`, pushed)
- **Спроектировано** через brainstorming (3 adversarial-ревью + ghost-pass): спек `docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md` v4. Закрыты C1 (escape Skill/EnterPlanMode никогда не блокируется) / C2 (skill-match только по реальному tool_use, без self-writable text-path) / C3 (write-deny на runtime, decoupled) / H1 (детерминированная токенизация) / V2-1 (stickiness-контракт, без потери/утечки между задачами) / V2-2 (`.`-segment-proof через `pathNormalize`). G3 override-подсистема вырезана как ghost-protection (escape всегда доступен).
- **Реализовано (TDD):** `extractKeywords` + `detectSkillMatch` + `runLiveDecision` + живой `runMain`/`main` в `tools/enforce-safe-baseline-metering.mjs` (+14 тестов); новый `tools/enforce-runtime-write-deny.mjs` (+7 тестов). Регрессия **1880 GREEN**.
- **Режим:** hard-block (решение владельца «убери g3, больше ничего»). observe-флаг не добавлялся.
- **Осталось (владелец):** регистрация обоих хуков в `.claude/settings.json` (точный блок — в handoff-заметке `2026-05-30-safe-baseline-overnight-handoff.md`); Claude'у settings.json заблокирован. До регистрации хуки инертны.
---
## Приоритет 2 — Layer 4 (судья): выключатель готов, активация за владельцем
### [~] 2. «Мозг» судьи (Layer 4 plumbing) — config-выключатель СДЕЛАН (30.05)
- **Находка:** движок `tools/llm-judge.mjs` УЖЕ полный (consensus + anti-injection + cache/budget); `llmJudgeCall` при отсутствии ключа возвращает `null`/degraded → fail-safe.
- **2a config-выключатель — СДЕЛАНО:** `tools/llm-judge-config.mjs` `resolveJudgeConfig()` — DEFAULT OFF, `enabled=true` только если И флаг `ROUTER_LLM_JUDGE_ENABLED` truthy, И ключ резолвится (keychain→env); keychain-ошибки degrade в «нет ключа, выключен», не бросают. +10 тестов GREEN; связка judge+safe-baseline 93/93 без регрессий. Файл написан, судья ОСТАЁТСЯ ВЫКЛЮЧЕННЫМ (нет флага, нет ключа, хуки не прописаны).
- **2b активация (НЕ сделано, требует владельца, деньги отсюда):** (1) ключ в keychain (служба `router-gate-llm-judge`/`default`) ИЛИ `ROUTER_LLM_KEY`; (2) `ROUTER_LLM_JUDGE_ENABLED=1`; (3) хуки `enforce-llm-judge-*` в settings.json. До всех трёх — $0.
### [x] 3. Хук-обёртки судьи — СДЕЛАНО (31.05, commit `ca52d354`, pushed)
- **Что:** `tools/enforce-llm-judge-per-tool.mjs` + `tools/enforce-llm-judge-response-scan.mjs` написаны по TDD как соседние обёртки — чистая `decide()` (уважает config-gate, disabled→allow $0) + namespaced **no-op `main()`** (БЕЗ регистрации в settings.json). 14 тестов GREEN, полный прогон без регрессий.
- **Зачем:** недостающее звено между движком судьи и settings.json — готово к шагу 2b.3.
- **Осталось (владелец, 2b):** ключ + флаг `ROUTER_LLM_JUDGE_ENABLED=1` + регистрация хуков в settings.json. До всех трёх — $0.
---
## Приоритет 3 — порядок и документация
### [~] 4. Синхронизация «мозга» (нормативка) — КОНТЕНТ ГОТОВ, ПРИМЕНЕНИЕ ЗАБЛОКИРОВАНО (31.05)
- **Готово:** ready-to-paste §6-абзац + §9-entry + header version-bump для 1b — `docs/observer/notes/2026-05-31-claude-md-1b-insertion-draft.md`. §0 cross-ref счётчики НЕ меняются (инфраструктура `tools/`, не tooling-канон #1-#86 / не ADR / не off-phase).
- **⚠️ НОВЫЙ БЛОКЕР (31.05):** `enforce-read-path-deny` (Smoke 5, 30.05) добавил `CLAUDE.md` в Read-protected paths → harness Edit требует предварительного Read → **Edit CLAUDE.md для Claude невозможен**, а Write-overwrite канонического файла слишком рискован. Это **over-block** legit `claude-md-management` workflow (Smoke 5 целил в transcript/runtime exfil; Read-deny на публичный-в-репо CLAUDE.md security-ценности не несёт). Владелец: либо сузить `DEFAULT_PROTECTED_PATTERNS` (убрать `CLAUDE.md` из Read-deny, оставить Bash/PowerShell/Write-защиты), либо вставить вручную из draft. Учение уже зафиксировано в этой заметке + handoff, ничего не теряется.
### [ ] 5. Выйти из изолированной копии (worktree) — ПОДГОТОВЛЕНО К РЕАЛИЗАЦИИ (31.05)
- **Верификация выполнена (31.05):** worktree `.claude/worktrees/router-gate-v4-stream-h-close` проверен — все 4 рабочих файла (`enforce-safe-baseline-metering.mjs`+`.test.mjs`, `llm-judge-config.mjs`+`.test.mjs`) **байт-в-байт идентичны main** (4× пустой `git diff --no-index`); `git log main..worktree-router-gate-v4-stream-h-close` **пуст** (нет уникальных коммитов). Несохранённой нужной работы НЕТ — терять нечего.
- **Готовая команда (выполняет ВЛАДЕЛЕЦ — `git worktree` для Claude в default-deny гейта, approval-пути к нему нет; через PowerShell — запрещённый обход):**
```bash
git worktree remove --force ".claude/worktrees/router-gate-v4-stream-h-close"
git branch -D worktree-router-gate-v4-stream-h-close # опционально — ветка-база, уникальных коммитов нет
```
`--force` нужен: рабочая папка worktree содержит те же 4 файла, что уже в main (relative своей старой ветки они «незакоммичены»), плюс авто-регенерируемый STATUS.md-дрейф.
- **Статус решения:** 30.05 владелец выбрал «оставить worktree». Шаги выше — на случай, когда решит удалить; ничего не блокируют (worktree безвреден, только занимает диск).
---
## Приоритет 4 — крупное, требует железа и ручных шагов владельца
### [ ] 6. Layer 5 (v4.2) — виртуалка / биометрия / YubiKey
- **Что:** Phase 1 VirtualBox ($0), Phase 2+3 — YubiKey ($50150 разово, один ключ покрывает биометрию + HSM).
- **Загвоздка:** Claude может написать только конфиги/инструкции; установка и железо — на владельце.
- **Делать:** отдельным заходом, когда дойдут руки и появится YubiKey.
---
## Перенос в git — СДЕЛАНО (31.05)
Всё зафиксировано и запушено в `origin/main` (`c8059880..84dcf4aa`, fast-forward, gitleaks-full-history GREEN / lychee 0 errors). Коммиты сессии:
- `ca52d354` — judge-обёртки (item 3).
- `6d512f5c`/`9f84d9ef`/`c86fdfc9`/`84dcf4aa` — спек safe-baseline v1→v4 + план + handoff (item 1b doc).
- `f740f612` — живой safe-baseline `main()` (item 1b code).
- `80e514f5` — `enforce-runtime-write-deny` (C3).
Items 1a/2a (`enforce-safe-baseline-metering` обёртка + `llm-judge-config`) были перенесены из worktree ранее (commits `6ac4b1c1`+`c8059880`).
## Что НЕ требует действий (уже сделано параллельными сессиями)
- recovery-procedures.md — есть.
- brain-retro таблицы 16–17 — есть (в анализаторе).
- Исправления `extractPathArgs` / `pathDenyOverlay` — есть.
- Защита от чтения транскриптов (Smoke 5) — работает.
- Smoke-тесты 1–9 — прогнаны.
@@ -0,0 +1,75 @@
# Safe-baseline live wiring (1b) — overnight handoff
**Date:** 2026-05-30 (night)
**Status:** Implemented + tested on disk. **NOT committed** (git commits need your AskUserQuestion approval at the gate; you were asleep). Morning = review → approve commits → register in settings.json.
---
## What was done autonomously
1. **Spec → v4** (`docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md`): removed the G3 override subsystem ("убери g3, больше ничего"); escape is now solely Skill/EnterPlanMode (always available). Runtime write-deny kept but **decoupled** into a standalone git-approval-anchor hardening. *(spec edits are on disk, uncommitted — the last committed spec is v3 `c86fdfc9`.)*
2. **Plan** (`docs/superpowers/plans/2026-05-30-safe-baseline-live-wiring.md`): 6 TDD tasks.
3. **Implementation (TDD, RED→GREEN):**
- `tools/enforce-safe-baseline-metering.mjs` — added `extractKeywords` (H1), `detectSkillMatch` (C2/V2-5), `runLiveDecision` (V2-1 stickiness contract), live `runMain`/`main` (replaces the no-op).
- `tools/enforce-runtime-write-deny.mjs` (new) — standalone write-deny on `~/.claude/runtime/**`, resolving `pathNormalize` (V2-2 `.`-segment-proof).
- Tests: `enforce-safe-baseline-metering.test.mjs` (+14), `enforce-runtime-write-deny.test.mjs` (+7).
4. **Regression:** `npm run test:tools`**1880 passed | 2 skipped** (was 1859). Narrow runs all GREEN.
## Decisions I made on my own (correct in the morning if wrong)
- **G3 override removed** — per your explicit instruction.
- **Hard-block kept (not observe-mode).** My honest recommendation was observe-first behind a mode flag, but you said "убери g3, больше ничего" → I did NOT add an observe mode. If you want observe-first, say so and I'll add a `mode` flag (default observe) cheaply.
- **`enforce-runtime-write-deny` fails-OPEN on a normalizer exception** (blocks only on a *confirmed* runtime match). Rationale: a fail-CLOSE Write hook that errors would self-lock the controller out of ALL edits during an unattended run. Residual: a malformed path that throws is not blocked. Flip to fail-CLOSE if you prefer strict security.
## Queued commits (morning — approve each exact git command at the gate)
```bash
git add docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md
git commit docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md -m "docs(router-gate-v4): safe-baseline spec v4 — cut G3 override, decouple write-deny (item 1b)"
git add docs/superpowers/plans/2026-05-30-safe-baseline-live-wiring.md
git commit docs/superpowers/plans/2026-05-30-safe-baseline-live-wiring.md -m "docs(router-gate-v4): safe-baseline live-wiring implementation plan (item 1b)"
git add tools/enforce-safe-baseline-metering.mjs tools/enforce-safe-baseline-metering.test.mjs
git commit tools/enforce-safe-baseline-metering.mjs tools/enforce-safe-baseline-metering.test.mjs -m "feat(safe-baseline): live main() — metering + hard-block + Skill/EnterPlanMode escape (item 1b)"
git add tools/enforce-runtime-write-deny.mjs tools/enforce-runtime-write-deny.test.mjs
git commit tools/enforce-runtime-write-deny.mjs tools/enforce-runtime-write-deny.test.mjs -m "feat(router-gate-v4): enforce-runtime-write-deny — protect ~/.claude/runtime side-channels (C3)"
git add docs/observer/notes/2026-05-30-safe-baseline-overnight-handoff.md
git commit docs/observer/notes/2026-05-30-safe-baseline-overnight-handoff.md -m "docs(observer): safe-baseline overnight handoff note"
```
(A fresh `npm run test:tools` GREEN gives the verify-before-push sentinel for the code commits; docs-only commits short-circuit.)
## Registration (you apply — Claude cannot edit settings.json)
Add to `.claude/settings.json` `hooks.PreToolUse`:
```json
{ "matcher": "Read|Grep|Glob|LS|TodoWrite|AskUserQuestion|Edit|Write|MultiEdit|NotebookEdit|Bash|Skill|Task|EnterPlanMode",
"hooks": [{ "type": "command", "command": "node tools/enforce-safe-baseline-metering.mjs", "timeout": 10 }] }
```
```json
{ "matcher": "Edit|Write|MultiEdit|NotebookEdit",
"hooks": [{ "type": "command", "command": "node tools/enforce-runtime-write-deny.mjs", "timeout": 5 }] }
```
Until registered, both hooks are inert.
**Before registering — owner check:** does `.claude/settings.json` already have a `permissions.deny` covering Write to `~/.claude/**`? If yes, `enforce-runtime-write-deny` is redundant (still harmless). I couldn't read settings.json (gate-blocked).
## Open questions for the morning
1. **"раздел 5 основного плана подготовь к реализации"** — which document and which section 5? Candidates: the remaining-holes checklist (`docs/observer/notes/2026-05-30-router-gate-v4-remaining-holes.md` — its item 5 = close the worktree, already decided "keep") OR the master coordination plan OR the v4 design §5. I did NOT guess to avoid wasted/wrong work. Tell me which and I'll prepare it.
2. **Normative sync ("корректируй всю документацию"):** CLAUDE.md / Pravila / PSR / Tooling — these are gate-protected AND were being edited by a parallel session (§15.2). The safe-baseline live-wiring is infrastructure (`tools/enforce-*.mjs`), not a new tooling-canon node / ADR / off-phase subcategory, so the §0 cross-ref counters likely do NOT change; CLAUDE.md §6 would get one paragraph + §9 one entry. To do via `claude-md-management` once the parallel session is done. Flagged, not done.
3. **observe vs enforce** (see Decisions).
4. **Judge activation (2b)** still owner-gated ($) — untouched.
## Not done (blocked, not skipped)
- Live registration / "run the agent" — needs settings.json (owner-only).
- Mandatory pre-registration smoke (owner-run after registering): the integration tests already exercise block/allow/escape; the registration smoke is a final live check.
- CLAUDE.md normative sync (blocked, see Q2).
- The commits themselves (gate needs your approval awake).
@@ -0,0 +1,26 @@
# CLAUDE.md insertion draft — safe-baseline 1b (ready to paste)
**Why a draft, not a direct edit:** `enforce-read-path-deny` (Smoke 5, 2026-05-30) added `CLAUDE.md` to the Read-protected paths (`DEFAULT_PROTECTED_PATTERNS` `/(^|\/)CLAUDE\.md$/i`). The harness Edit tool requires a prior Read of the target; with Read gate-blocked, **Edit of CLAUDE.md is impossible** for Claude, and a full Write-overwrite of the canonical file is too risky. This is an over-block of the legit `claude-md-management` workflow (the Smoke 5 fix targeted transcript/runtime exfil; normative-doc Read-deny is collateral).
**Owner options:**
1. Temporarily narrow `DEFAULT_PROTECTED_PATTERNS` so `enforce-read-path-deny` does NOT block `CLAUDE.md` Read (keep the Bash/PowerShell + Write protections); then a normal `claude-md-management` session applies the inserts. **Recommended** — the Read-deny on CLAUDE.md has no security value (CLAUDE.md is public-in-repo; the real exfil targets are `~/.claude/projects` transcripts + `~/.claude/runtime`).
2. Paste the blocks below manually.
The substantive learning is already committed in `docs/observer/notes/2026-05-30-router-gate-v4-remaining-holes.md` + the handoff note, so nothing is lost meanwhile.
---
## Header version line — bump
Change the opening of `**Версия:** 2.42 …` to v2.43, prepending:
> **Версия:** 2.43 от 31.05.2026 — **router-gate v4 safe-baseline live wiring (item 1b) + enforce-runtime-write-deny (C3) + LLM-judge hook-обёртки реализованы, протестированы (1880 GREEN), запушены** (commits `ca52d354`+`6d512f5c..84dcf4aa`+`f740f612`+`80e514f5` на main). Spec v4 закрыл C1/C2/C3/H1/V2-1/V2-2 через 3 adversarial-ревью + ghost-pass; G3 override вырезан как защита-призрак. §0 cross-refs НЕ меняются (инфраструктура `tools/`, не tooling-канон #1-#86 / не ADR / не off-phase). **v2.42 наследие:** …(оставить прежний текст)…
## §6 — prepend this paragraph (above the 2026-05-29 entry)
**2026-05-31 router-gate v4 — safe-baseline live wiring (item 1b) + enforce-runtime-write-deny (C3) + LLM-judge hook-обёртки реализованы и запушены:** `tools/enforce-safe-baseline-metering.mjs` получил живой `main()` (метеринг safe-baseline tools per-task + hard-block mutating-инструмента за hard-порогом без skill-match; escape = вызов любого Skill/EnterPlanMode, который этим слоем никогда не блокируется); новые чистые функции `extractKeywords` (детерминированная токенизация со стоп-словами против ложного overlap), `detectSkillMatch` (только реальный assistant tool_use Skill/EnterPlanMode — не self-writable text-path), `runLiveDecision` (контракт stickiness: skill-match привязан к задаче и явно сохраняется, без потери и без утечки между задачами). Новый standalone-хук `tools/enforce-runtime-write-deny.mjs` закрывает уже-существующую дыру: Write/Edit-инструмент мог писать в `~/.claude/runtime/**` напрямую (git-approval anchor был открыт для Write-инструмента — Bash/PowerShell-гейты его прикрывали, Write-канал нет); нормализация через resolving `pathNormalize` (`path.resolve`+`realpath`) делает обход через `.`/`..`-сегменты невозможным. Спроектировано через `superpowers:brainstorming` (3 раунда adversarial-саморевью + ghost-pass), spec v4 `docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md` закрыл C1/C2/C3/H1/V2-1/V2-2; G3 override-подсистема вырезана как защита-призрак. Реализация через `superpowers:writing-plans` → TDD. Также `tools/enforce-llm-judge-per-tool.mjs` + `tools/enforce-llm-judge-response-scan.mjs` (Layer 4 hook-обёртки, no-op `main()`, $0 до активации 2b). Регрессия vitest tools-only **1880 GREEN**. Коммиты `ca52d354`+`6d512f5c..84dcf4aa`+`f740f612`+`80e514f5` (push `c8059880..84dcf4aa main`, gitleaks-full-history GREEN / lychee 0 errors). Режим **hard-block** (решение владельца). Регистрация обоих хуков в `.claude/settings.json` — шаг владельца (Claude'у settings.json заблокирован); до регистрации хуки инертны. **§0 cross-refs НЕ меняются** — инфраструктура `tools/enforce-*.mjs`, не tooling-канон #1-#86 / не ADR / не off-phase. Через `claude-md-management:revise-claude-md`.
## §9 — prepend this entry (above the v2.42 entry)
- **v2.43 от 31.05.2026 — safe-baseline live wiring (item 1b) + enforce-runtime-write-deny (C3) + LLM-judge hook-обёртки** — `tools/enforce-safe-baseline-metering.mjs` живой `main()` (метеринг + hard-block + Skill/EnterPlanMode escape) с чистыми `extractKeywords`/`detectSkillMatch`/`runLiveDecision` (stickiness-контракт V2-1); новый `tools/enforce-runtime-write-deny.mjs` (C3 — защита `~/.claude/runtime` от Write-инструмента, `.`-segment-proof через `pathNormalize`); judge-обёртки `enforce-llm-judge-{per-tool,response-scan}.mjs` (no-op main, $0). Спек v4 через brainstorming (3 adversarial-ревью + ghost-pass) закрыл C1/C2/C3/H1/V2-1/V2-2; G3 override вырезан как защита-призрак. TDD, регрессия 1880 GREEN. Commits `ca52d354`+`6d512f5c..84dcf4aa`+`f740f612`+`80e514f5`, push `c8059880..84dcf4aa`. **§0 cross-refs не меняются** (инфраструктура `tools/`, не tooling-канон / не ADR / не off-phase). §6 +абзац / §9 +этот entry. Через `claude-md-management:revise-claude-md`.
@@ -0,0 +1,641 @@
# Lead Region Resolution — Master Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use `superpowers:subagent-driven-development` (recommended) or `superpowers:executing-plans` to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
>
> **This is a MASTER plan split into 6 sessions.** Each session is a self-contained, testable deliverable. Execute sessions **in order** (later sessions depend on earlier ones). Each session = one subagent-driven-development run with its own review checkpoints. Before starting a session, re-read this header + the session's "Preconditions".
**Goal:** Резолвить настоящий регион лида по телефону (DaData → Россвязь → tag-fallback) и переключить `LeadRouter` на каскадную маршрутизацию по региону, чтобы клиенты, делящие один источник с разными regions, получали только лиды своего региона.
**Architecture:** Новый сервис `LeadRegionResolver` вызывается в `RouteSupplierLeadJob::handle()` ДО транзакционного цикла, резолвит `subject_code` + оператора по телефону, персистит в `supplier_leads` + `lead_region_resolution_log`. `LeadRouter::matchEligibleProjects` получает новый параметр `?int $resolvedSubjectCode` и фильтрует кандидатов в 3 фазы (точное совпадение региона → «вся РФ» → запасной канал с подменой). Локальный реестр Россвязи (`phone_ranges`) — fallback когда DaData недоступна/неуверена.
**Tech Stack:** PHP 8.3, Laravel 13, PostgreSQL 16 (партиции, RLS, `INT[]`), Pest 4, Redis (кэш + token-bucket), DaData REST API (`cleaner.dadata.ru/api/v1/clean/phone`).
**Source spec:** [docs/superpowers/specs/2026-05-29-lead-region-resolution-design.md](../specs/2026-05-29-lead-region-resolution-design.md) v0.5. Прочитать целиком перед стартом — этот план не дублирует §3-§12 спеки, а превращает их в исполнимые шаги.
---
## ⚠️ КРИТИЧЕСКИЕ ПОПРАВКИ К СПЕКЕ (читать ДО любого кода)
Эти расхождения спеки с фактическим кодом обнаружены прямым code-walking 30.05.2026. Implementer ОБЯЗАН следовать факту, а не цифрам/именам из спеки.
1. **Коды субъектов — НЕ автомобильные.** Спека §3.4.1 пишет «77 Москва, 50 МО, 78 СПб, 47 ЛО» — это НЕВЕРНО. Источник истины — [`app/app/Support/RussianRegions.php`](../../../app/app/Support/RussianRegions.php) `CODE_TO_NAME` (конституционный порядок ст. 65, 1..89):
- **Москва = 82**, **Санкт-Петербург = 83**, **Московская область = 56**, **Ленинградская область = 53**.
- Севастополь = 84, Республика Крым = 13.
- Везде в коде/тестах/маппингах использовать ЭТИ коды.
2. **`RussianRegions` НЕ имеет `codeToName()`-метода.** Есть только `public const CODE_TO_NAME` (массив) и `public static function nameToCode(): array` (через `array_flip`). Если нужен code→name — читать константу `RussianRegions::CODE_TO_NAME[$code]`.
3. **`LeadRouter::matchEligibleProjects` имеет ДВА SQL-пути** — `DIRECT` (по `signal_type` + `unique_key`) и `B1/B2/B3` (через `project_supplier_links` pivot). Каскад (§3.9) спека показывает только для pivot-пути — **реализовать каскад для ОБОИХ путей**.
4. **`project_routing_snapshots` УЖЕ содержит `regions INT[] NOT NULL DEFAULT '{}'`** (миграция `2026_05_27_120000`). Колонку добавлять НЕ нужно — каскадный WHERE ложится на готовую колонку через `?::int = ANY(snap.regions)` и `snap.regions = '{}'::int[]`.
5. **`LeadDistributor::selectRecipients` сейчас берёт cap=3 СЛУЧАЙНО.** Каскад спеки требует упорядоченный отбор (точное → РФ → запасной, сортировка по остатку лимита DESC) внутри роутера. Реконсиляция: роутер сам обрезает до 3 упорядоченно → `LeadDistributor` при `count ≤ CAP` возвращает коллекцию как есть (без шаффла, строка 36-38). Это **смена поведения** (random → детерминированный по остатку лимита). Зафиксировано как сознательное решение — см. §«Открытый вопрос D1» ниже. НЕ менять `LeadDistributor`; роутер просто отдаёт ≤3.
6. **`subject_code` пишется в `deals` уже сейчас** (Job строка 405-406, через `?int $subjectCode` из `RegionTagResolver`). Интеграция — заменить источник, не добавить колонку. `deals.subject_code` уже существует (миграция `2026_05_20_102000`).
7. **Команда запуска тестов:** из каталога `app/`. Один файл: `cd app && ./vendor/bin/pest tests/Unit/Services/LeadRegionResolverTest.php`. Фильтр по имени: `cd app && ./vendor/bin/pest --filter="dadata qc 0"`. Полный прогон сервиса перед коммитом сессии. **NB Bash cwd persists** — всегда префиксить `cd app &&` или использовать subshell.
---
## Открытые вопросы для заказчика (решить ДО Session 5-6)
- **D1 (поведение распределения):** Сейчас при >3 кандидатах лид раздаётся 3 СЛУЧАЙНЫМ клиентам. Новый каскад раздаёт 3 клиентам с НАИБОЛЬШИМ остатком дневного лимита (детерминированно). Это значит: клиент с большим остатком лимита систематически получает больше лидов, чем клиент с малым. Спека §3.9 явно выбрала «сортировка по остатку DESC». **Подтвердить, что random-распределение можно убрать.** (Если заказчик хочет сохранить случайность внутри региона — это +1 задача: random-shuffle внутри каждой фазы перед cap.)
- **D2 (ambiguous-list staging):** Список «объединённых» регионов DaData (`'Санкт-Петербург и область'`, `'Москва и область'`) расширяется только по реальным наблюдениям на staging (спека §3.4.1). На старте — ровно эти 2 строки. Подтверждается smoke-прогоном (Session 6).
---
## Общие конвенции (применять во ВСЕХ сессиях)
### Тестовый сетап (Pest 4)
- **Unit-тесты** (`app/tests/Unit/...`): чистые, без БД где возможно; `Http::fake()` для DaData; `Cache::fake()`/`Cache::store('array')` для кэша.
- **Feature-тесты** (`app/tests/Feature/...`): `uses(DatabaseTransactions::class)` + `uses(Tests\Concerns\SharesSupplierPdo::class)`. Tenant-контекст: `DB::statement("SELECT set_config('app.current_tenant_id', '0', true)")` в `beforeEach` (как [`LeadRouterTest.php`](../../../app/tests/Feature/Services/LeadRouterTest.php)).
- Фабрики: `Tenant::factory()`, `Project::factory()`, `SupplierProject::factory()`/`::query()->create([...])`, `SupplierLead::factory()`.
- Хелперы (в [`app/tests/Pest.php`](../../../app/tests/Pest.php)): `linkProjectToSupplier($project, $supplier)`, `createRoutingSnapshotFromProject($project, ...)`**последний расширяется в Session 5** (добавить `string $regions = '{}'` параметр).
- Pest-стиль: `it('...', function () { ... })`, `expect($x)->toBe(...)`. Никакого PHPUnit class-стиля в новых тестах.
### Паттерн миграции (raw SQL, образец — `2026_05_27_120000_create_project_routing_snapshots_table.php`)
```php
<?php
declare(strict_types=1);
use Illuminate\Database\Migrations\Migration;
use Illuminate\Support\Facades\DB;
return new class extends Migration {
public function up(): void
{
// SET ROLE crm_migrator на проде; на dev/testing — fallback postgres superuser.
try {
DB::statement('SET ROLE crm_migrator');
$canCreate = DB::selectOne("SELECT has_schema_privilege('crm_migrator', 'public', 'CREATE') AS ok");
if (!$canCreate || !$canCreate->ok) { DB::statement('RESET ROLE'); }
} catch (\Throwable) { /* окружение без роли — продолжаем как superuser */ }
DB::unprepared(<<<'SQL'
-- DDL здесь
SQL);
}
public function down(): void
{
try {
DB::statement('SET ROLE crm_migrator');
$canCreate = DB::selectOne("SELECT has_schema_privilege('crm_migrator', 'public', 'CREATE') AS ok");
if (!$canCreate || !$canCreate->ok) { DB::statement('RESET ROLE'); }
} catch (\Throwable) {}
DB::statement('DROP TABLE IF EXISTS <table> CASCADE');
}
};
```
- GRANT'ы: SaaS-level read-таблицы → `crm_readonly` + `crm_supplier_worker` SELECT; запись через `crm_migrator`. Tenant-таблицы → RLS policy + GRANT `crm_app_user`/`crm_supplier_worker` (образец snapshot-миграции строки 49-55).
- Партиционированные таблицы: явный `CREATE TABLE ..._y2026_m05 PARTITION OF ...` для текущего+следующего месяца + регистрация retention в `system_settings` (образец строки 57-78).
- **`db/schema.sql` + `db/CHANGELOG_schema.md`** обновлять при каждой схемной правке (правило §4.2 / §5 п.8 CLAUDE.md). Bump версии schema в header.
### Git / коммиты
- Ветка: `feat/lead-region-resolution` (создаётся в Session 1, см. Preconditions).
- Частые атомарные коммиты (per task). Conventional commits: `feat(region):`, `test(region):`, `chore(region):`.
- Каждая сессия завершается зелёной регрессией затронутого слоя + push.
---
## SESSION 1 — Схема БД + регистрация партиций
**Deliverable:** Все таблицы и колонки фичи существуют, миграция up/down работает, партиции регистрируются. Никакой бизнес-логики.
**Preconditions:** Чистый `main` (или согласованная база). Создать ветку: `git switch -c feat/lead-region-resolution`. Закоммитить spec (untracked) первым коммитом.
**Files:**
- Create: `app/database/migrations/2026_05_31_100000_create_phone_ranges_and_resolution_log.php`
- Modify: `app/app/Services/MonthlyPartitionManager.php:48-62` (PARTITIONED_TABLES map)
- Modify: `db/schema.sql` (новые таблицы + ALTER, bump версии) + `db/CHANGELOG_schema.md`
- Test: `app/tests/Feature/Migrations/PhoneRangesMigrationTest.php`
### Task 1.1 — Failing test: миграция создаёт таблицы и колонки
- [ ] **Step 1: Написать падающий тест**
`app/tests/Feature/Migrations/PhoneRangesMigrationTest.php`:
```php
<?php
declare(strict_types=1);
use Illuminate\Support\Facades\DB;
use Tests\Concerns\SharesSupplierPdo;
uses(SharesSupplierPdo::class);
it('creates phone_ranges with lookup index', function (): void {
expect(DB::selectOne("SELECT to_regclass('public.phone_ranges') AS t")->t)->not->toBeNull();
$cols = collect(DB::select("SELECT column_name FROM information_schema.columns WHERE table_name='phone_ranges'"))
->pluck('column_name')->all();
expect($cols)->toContain('def_code', 'from_num', 'to_num', 'operator', 'region', 'subject_code', 'import_id');
});
it('creates lead_region_resolution_log as partitioned table', function (): void {
$p = DB::selectOne("SELECT partattrs FROM pg_partitioned_table pt JOIN pg_class c ON c.oid=pt.partrelid WHERE c.relname='lead_region_resolution_log'");
expect($p)->not->toBeNull();
});
it('adds resolution columns to supplier_leads and deals', function (): void {
$sl = collect(DB::select("SELECT column_name FROM information_schema.columns WHERE table_name='supplier_leads'"))->pluck('column_name')->all();
expect($sl)->toContain('resolved_subject_code', 'region_source', 'dadata_qc', 'phone_operator');
$d = collect(DB::select("SELECT column_name FROM information_schema.columns WHERE table_name='deals'"))->pluck('column_name')->all();
expect($d)->toContain('phone_operator', 'region_substituted');
});
```
- [ ] **Step 2: Прогнать — убедиться что падает** (`cd app && ./vendor/bin/pest tests/Feature/Migrations/PhoneRangesMigrationTest.php` → FAIL: relation does not exist)
- [ ] **Step 3: Написать миграцию.** DDL по спеке §4.1-§4.6 с поправками. Полный DDL (вставить в `DB::unprepared`):
```sql
-- 1. phone_ranges_imports (журнал импортов — создаём ПЕРВЫМ, на него FK)
CREATE TABLE phone_ranges_imports (
id BIGSERIAL PRIMARY KEY,
imported_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
source_url TEXT NOT NULL,
rows_inserted INTEGER NOT NULL DEFAULT 0,
rows_updated INTEGER NOT NULL DEFAULT 0,
checksum_sha256 TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'in_progress'
CHECK (status IN ('in_progress','completed','failed','rolled_back')),
error TEXT,
completed_at TIMESTAMPTZ
);
-- 2. phone_ranges (реестр Россвязи, SaaS-level без RLS)
CREATE TABLE phone_ranges (
id BIGSERIAL PRIMARY KEY,
def_code SMALLINT NOT NULL,
from_num BIGINT NOT NULL,
to_num BIGINT NOT NULL,
operator TEXT NOT NULL,
region TEXT NOT NULL,
region_normalized TEXT,
subject_code SMALLINT,
imported_at TIMESTAMPTZ NOT NULL,
import_id BIGINT NOT NULL REFERENCES phone_ranges_imports(id),
CONSTRAINT chk_phone_ranges_def_code CHECK (def_code BETWEEN 300 AND 999),
CONSTRAINT chk_phone_ranges_subject_code CHECK (subject_code IS NULL OR subject_code BETWEEN 1 AND 89),
CONSTRAINT chk_phone_ranges_range_valid CHECK (from_num <= to_num)
);
CREATE INDEX idx_phone_ranges_lookup ON phone_ranges (def_code, from_num, to_num);
GRANT SELECT ON phone_ranges, phone_ranges_imports TO crm_readonly, crm_supplier_worker;
-- 3. lead_region_resolution_log (SaaS-level, партиционирован по received_at)
CREATE TABLE lead_region_resolution_log (
id BIGSERIAL,
supplier_lead_id BIGINT NOT NULL,
received_at TIMESTAMPTZ NOT NULL,
phone_masked TEXT NOT NULL,
subject_code_resolved SMALLINT,
subject_code_from_tag SMALLINT,
region_source TEXT NOT NULL CHECK (region_source IN ('dadata','rossvyaz','tag','unknown')),
dadata_qc SMALLINT,
dadata_provider TEXT,
dadata_type TEXT,
dadata_response_masked JSONB,
rossvyaz_matched BOOLEAN NOT NULL DEFAULT FALSE,
actual_subject_code SMALLINT CHECK (actual_subject_code IS NULL OR actual_subject_code BETWEEN 1 AND 89),
substituted_subject_code SMALLINT CHECK (substituted_subject_code IS NULL OR substituted_subject_code BETWEEN 1 AND 89),
routing_step SMALLINT CHECK (routing_step IS NULL OR routing_step BETWEEN 1 AND 3),
phone_operator TEXT,
cache_hit BOOLEAN NOT NULL DEFAULT FALSE,
duration_ms INTEGER,
resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, received_at)
) PARTITION BY RANGE (received_at);
CREATE INDEX idx_lrrl_lead_id ON lead_region_resolution_log (supplier_lead_id);
CREATE INDEX idx_lrrl_source ON lead_region_resolution_log (region_source, received_at);
GRANT SELECT, INSERT ON lead_region_resolution_log TO crm_supplier_worker;
GRANT SELECT ON lead_region_resolution_log TO crm_readonly;
CREATE TABLE lead_region_resolution_log_y2026_m05 PARTITION OF lead_region_resolution_log
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');
CREATE TABLE lead_region_resolution_log_y2026_m06 PARTITION OF lead_region_resolution_log
FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');
-- 4. supplier_leads +4 колонки (persistent idempotency + denormalized display)
ALTER TABLE supplier_leads
ADD COLUMN resolved_subject_code SMALLINT CHECK (resolved_subject_code IS NULL OR resolved_subject_code BETWEEN 1 AND 89),
ADD COLUMN region_source TEXT CHECK (region_source IN ('dadata','rossvyaz','tag','unknown')),
ADD COLUMN dadata_qc SMALLINT,
ADD COLUMN phone_operator TEXT;
-- 5. deals +2 колонки
ALTER TABLE deals
ADD COLUMN phone_operator TEXT,
ADD COLUMN region_substituted BOOLEAN NOT NULL DEFAULT FALSE;
```
В том же `up()` после `DB::unprepared`: зарегистрировать retention `lead_region_resolution_log` в `system_settings` (паттерн snapshot-миграции строки 67-78, `value => '12'`, 365 дней). `down()`: `DROP TABLE IF EXISTS lead_region_resolution_log, phone_ranges, phone_ranges_imports CASCADE` + `ALTER TABLE ... DROP COLUMN IF EXISTS ...` для supplier_leads/deals + удалить system_settings ключ.
> **Гайд по партициям:** новый партиционированный `lead_region_resolution_log` имеет ключ `received_at` (как `deals`). Партиции `deals` создаются помесячно — наши партиции на старте только m05/m06, дальше их подхватит `partitions:create-months` ПОСЛЕ регистрации в Task 1.2.
- [ ] **Step 4: Прогнать тест — PASS** (`cd app && ./vendor/bin/pest tests/Feature/Migrations/PhoneRangesMigrationTest.php`)
- [ ] **Step 5: Коммит** `git add -A && git commit -m "feat(region): schema — phone_ranges, resolution_log, supplier_leads/deals columns"`
### Task 1.2 — Регистрация новой партиц-таблицы в MonthlyPartitionManager
- [ ] **Step 1: Падающий тест** `app/tests/Unit/Services/MonthlyPartitionManagerRegionLogTest.php`:
```php
<?php
declare(strict_types=1);
use App\Services\MonthlyPartitionManager;
it('knows lead_region_resolution_log partition key', function (): void {
expect(MonthlyPartitionManager::PARTITIONED_TABLES)->toHaveKey('lead_region_resolution_log');
expect(MonthlyPartitionManager::PARTITIONED_TABLES['lead_region_resolution_log'])->toBe('received_at');
});
```
- [ ] **Step 2: Прогнать — FAIL.**
- [ ] **Step 3: Добавить** в `MonthlyPartitionManager::PARTITIONED_TABLES` (после строки 61) `'lead_region_resolution_log' => 'received_at',`.
- [ ] **Step 4: Прогнать — PASS.**
- [ ] **Step 5: Коммит** `chore(region): register lead_region_resolution_log in MonthlyPartitionManager`.
### Task 1.3 — Синхронизация db/schema.sql + CHANGELOG
- [ ] **Step 1:** Добавить новые `CREATE TABLE`/`ALTER` в `db/schema.sql` (зеркало миграции), bump версии в header.
- [ ] **Step 2:** Запись в `db/CHANGELOG_schema.md` (новая версия, перечень изменений).
- [ ] **Step 3:** Коммит `chore(region): sync db/schema.sql + CHANGELOG for region resolution`.
**Session 1 завершение:** прогон `cd app && ./vendor/bin/pest tests/Feature/Migrations tests/Unit/Services/MonthlyPartitionManagerRegionLogTest.php` → GREEN. Push.
---
## SESSION 2 — Россвязь: реестр + lookup
**Deliverable:** `RossvyazPrefixLookup` находит регион+оператора по телефону через `phone_ranges`; `phone-ranges:import` команда импортирует реестр.
**Preconditions:** Session 1 смержена/на ветке. Таблицы `phone_ranges*` существуют.
**Files:**
- Create: `app/app/Services/RossvyazPrefixLookup.php`, `app/app/Services/Dto/RossvyazRecord.php`
- Create: `app/app/Console/Commands/PhoneRangesImportCommand.php`
- Test: `app/tests/Unit/Services/RossvyazPrefixLookupTest.php`, `app/tests/Feature/Console/PhoneRangesImportCommandTest.php`
### Task 2.1 — RossvyazRecord DTO + Lookup (TDD)
- [ ] **Step 1: Падающие тесты** `RossvyazPrefixLookupTest.php` (Feature, нужна БД — `uses(DatabaseTransactions::class, SharesSupplierPdo::class)`; сидируем `phone_ranges` напрямую через `DB::table`):
```php
it('mobile prefix returns correct region and operator', function (): void {
DB::table('phone_ranges')->insert([
'def_code'=>921,'from_num'=>5550000,'to_num'=>5559999,'operator'=>'МегаФон',
'region'=>'Санкт-Петербург','subject_code'=>83,'imported_at'=>now(),'import_id'=>seedImport(),
]);
$rec = app(App\Services\RossvyazPrefixLookup::class)->find('7921555XXXX');
expect($rec)->not->toBeNull()->and($rec->subjectCode)->toBe(83)->and($rec->region)->toBe('Санкт-Петербург');
});
it('prefers narrower range when two ranges overlap', function (): void { /* два диапазона, узкий выигрывает (ORDER BY to_num-from_num ASC) */ });
it('returns null for unknown prefix', function (): void {
expect(app(App\Services\RossvyazPrefixLookup::class)->find('7999XXXXXXX'))->toBeNull();
});
```
(`seedImport()` — локальный хелпер в тесте: вставляет строку `phone_ranges_imports` и возвращает id.)
- [ ] **Step 2: FAIL.**
- [ ] **Step 3: Реализация.** `RossvyazRecord` — readonly DTO (`subjectCode: ?int`, `region: string`, `operator: string`). `RossvyazPrefixLookup::find(string $phone): ?RossvyazRecord` по алгоритму спеки §3.7: `def_code = (int) substr($phone,1,3)`, `subscriber = (int) substr($phone,4)`, SQL `SELECT region, operator, subject_code FROM phone_ranges WHERE def_code=? AND from_num<=? AND to_num>=? ORDER BY (to_num-from_num) ASC LIMIT 1`. Запрос через `DB::connection('pgsql_supplier')` (BYPASSRLS, как LeadRouter).
- [ ] **Step 4: PASS.**
- [ ] **Step 5: Коммит** `feat(region): RossvyazPrefixLookup + RossvyazRecord DTO`.
### Task 2.2 — PhoneRangesImportCommand (TDD)
- [ ] **Step 1: Падающий Feature-тест**`phone-ranges:import --dry-run` парсит фикстурный XLSX/CSV в `phone_ranges_staging`, маппит region→subject_code через `RussianRegions::nameToCode()`, при `--dry-run` не свапает. (Фикстура: маленький CSV в `app/tests/Fixtures/rossvyaz/sample.csv`.)
- [ ] **Step 2: FAIL.**
- [ ] **Step 3: Реализация** по спеке §6.2: staging-таблица → COPY → checksum-idempotency → atomic `RENAME` swap → `phone_ranges_imports.status`. Несматчившиеся регионы → лог в `phone_ranges_imports.error`. `--dry-run` останавливается до swap. **NB:** реальный источник — пакет ~500-600 файлов XLSX (§6.1); для теста парсим один CSV-фикстуру. Парсер XLSX — отдельный приватный метод, в тесте подменяется CSV-веткой через флаг формата.
- [ ] **Step 4: PASS.**
- [ ] **Step 5: Коммит** `feat(region): phone-ranges:import command with atomic swap + idempotency`.
**Session 2 завершение:** GREEN сервис-слой Россвязи. Push. (Реальный первый импорт реестра — оператором в Session 6 раскатке, не в тесте.)
---
## SESSION 3 — DaData клиент + бюджет + rate-limit + region map
**Deliverable:** `DaDataPhoneClient` дёргает REST, `DaDataRegionMap` маппит имя→код, `DaDataBudgetGuard` режет по дневному лимиту, token-bucket защищает от 429. Никакой оркестрации (она в Session 4).
**Preconditions:** Sessions 1-2 готовы.
**Files:**
- Create: `app/app/Services/DaData/DaDataPhoneClient.php`, `DaDataPhoneResponse.php`, `DaDataQualityCode.php`, `DaDataException.php`, `DaDataTimeoutException.php`
- Create: `app/app/Services/DaData/DaDataBudgetGuard.php`
- Create: `app/app/Support/DaDataRegionMap.php`
- Modify: `app/config/services.php` (+`dadata` блок)
- Test: `app/tests/Unit/Services/DaData/DaDataPhoneClientTest.php`, `DaDataBudgetGuardTest.php`, `app/tests/Unit/Support/DaDataRegionMapTest.php`
### Task 3.1 — config/services.php + DaDataQualityCode enum
- [ ] **Step 1:** Добавить в `config/services.php`:
```php
'dadata' => [
'api_key' => env('DADATA_API_KEY'),
'secret' => env('DADATA_SECRET'),
'timeout_ms' => (int) env('DADATA_TIMEOUT_MS', 2000),
'retries' => (int) env('DADATA_RETRIES', 1),
'daily_cap_rub' => (int) env('DADATA_DAILY_CAP_RUB', 10000),
'enabled' => filter_var(env('LEAD_REGION_RESOLVER_ENABLED', false), FILTER_VALIDATE_BOOL),
'cache_ttl_days' => (int) env('PHONE_REGION_CACHE_TTL_DAYS', 30),
],
```
- [ ] **Step 2:** `DaDataQualityCode` — enum:int (CASE_RECOGNIZED=0, ASSUMPTIONS=1, EMPTY=2, MULTIPLE=3, FOREIGN=7). Без теста (тривиальный enum) — покрывается через клиент.
- [ ] **Step 3: Коммит** `chore(region): config/services dadata + DaDataQualityCode enum`.
### Task 3.2 — DaDataRegionMap (TDD)
- [ ] **Step 1: Падающий unit-тест** `DaDataRegionMapTest.php`:
```php
use App\Support\DaDataRegionMap;
it('maps exact official names via RussianRegions', function (): void {
expect(DaDataRegionMap::toSubjectCode('Москва'))->toBe(82);
expect(DaDataRegionMap::toSubjectCode('Московская область'))->toBe(56);
expect(DaDataRegionMap::toSubjectCode('Санкт-Петербург'))->toBe(83);
expect(DaDataRegionMap::toSubjectCode('Ленинградская область'))->toBe(53);
});
it('flags ambiguous agglomeration strings', function (): void {
expect(DaDataRegionMap::isAmbiguous('Санкт-Петербург и область'))->toBeTrue();
expect(DaDataRegionMap::isAmbiguous('Москва и область'))->toBeTrue();
expect(DaDataRegionMap::isAmbiguous('Москва'))->toBeFalse();
});
it('returns null for unmappable region', function (): void {
expect(DaDataRegionMap::toSubjectCode('Атлантида'))->toBeNull();
});
it('resolves all 89 RussianRegions names', function (): void {
foreach (App\Support\RussianRegions::CODE_TO_NAME as $code => $name) {
expect(DaDataRegionMap::toSubjectCode($name))->toBe($code);
}
});
```
- [ ] **Step 2: FAIL.**
- [ ] **Step 3: Реализация.** `DaDataRegionMap`: `AMBIGUOUS_REGIONS = ['Санкт-Петербург и область','Москва и область']` (const). `OVERRIDES` — массив для несовпадающих имён (на старте пустой — заполняется findings). `toSubjectCode(string $name): ?int` → trim → `OVERRIDES[$name] ?? RussianRegions::nameToCode()[$name] ?? null`. `isAmbiguous(string $name): bool``in_array($name, self::AMBIGUOUS_REGIONS, true)`.
- [ ] **Step 4: PASS.**
- [ ] **Step 5: Коммит** `feat(region): DaDataRegionMap with ambiguous-list + 89-region coverage`.
### Task 3.3 — DaDataPhoneClient (TDD, Http::fake)
> **Конвенция HTTP-клиента** — зеркалить [`app/app/Services/Supplier/SupplierPortalClient.php`](../../../app/app/Services/Supplier/SupplierPortalClient.php): инжектить `Illuminate\Http\Client\Factory $http`, кастомные исключения, приватный `request()`.
- [ ] **Step 1: Падающие unit-тесты** `DaDataPhoneClientTest.php` (по одному на qc 0/1/2/3/7 + timeout + 5xx-retry + 4xx-no-retry). Пример:
```php
use App\Services\DaData\DaDataPhoneClient;
use Illuminate\Support\Facades\Http;
it('parses qc=0 mobile response', function (): void {
Http::fake(['cleaner.dadata.ru/*' => Http::response([[
'qc'=>0,'qc_conflict'=>0,'type'=>'Мобильный','phone'=>'+7 921 555-12-34',
'provider'=>'МегаФон','region'=>'Санкт-Петербург и область','timezone'=>'UTC+3',
]], 200)]);
$resp = app(DaDataPhoneClient::class)->cleanPhone('7921555XXXX');
expect($resp->qc)->toBe(0)->and($resp->provider)->toBe('МегаФон')
->and($resp->region)->toBe('Санкт-Петербург и область');
});
it('throws DaDataTimeoutException on connection timeout', function (): void {
Http::fake(fn () => throw new Illuminate\Http\Client\ConnectionException('timeout'));
expect(fn () => app(DaDataPhoneClient::class)->cleanPhone('7921555XXXX'))
->toThrow(App\Services\DaData\DaDataTimeoutException::class);
});
```
- [ ] **Step 2: FAIL.**
- [ ] **Step 3: Реализация** по §3.6: POST `https://cleaner.dadata.ru/api/v1/clean/phone`, headers `Authorization: Token <key>`, `X-Secret: <secret>`, body `["<phone>"]`, timeout из config, retry на сетевые/5xx. Парсинг массива[0] → `DaDataPhoneResponse` (readonly DTO, поля по §3.6). `ConnectionException`/таймаут → `DaDataTimeoutException`; не-2xx после retry → `DaDataException`.
- [ ] **Step 4: PASS.**
- [ ] **Step 5: Коммит** `feat(region): DaDataPhoneClient + DTO + exceptions`.
### Task 3.4 — DaDataBudgetGuard + token-bucket (TDD)
- [ ] **Step 1: Падающий тест**`canSpend()` true пока `phone_resolution.dadata.spent_today_kopecks < daily_cap`; false при превышении; `recordSpend()` делает Redis INCRBY. (`Cache::store('array')` или Redis-fake.)
- [ ] **Step 2: FAIL.**
- [ ] **Step 3: Реализация** §5.3 + §3.13: `DaDataBudgetGuard` (canSpend/recordSpend через Redis-ключ с дневным TTL). Token-bucket 18 RPS — `RateLimiter::for('dadata-cleaner', ...)` зарегистрировать в провайдере; в клиенте обернуть вызов (или отдельный guard — решить в Session 4 при сборке).
- [ ] **Step 4: PASS.**
- [ ] **Step 5: Коммит** `feat(region): DaDataBudgetGuard + rate-limit`.
**Session 3 завершение:** GREEN `tests/Unit/Services/DaData tests/Unit/Support/DaDataRegionMapTest.php`. Push.
---
## SESSION 4 — LeadRegionResolver (оркестратор)
**Deliverable:** `LeadRegionResolver::resolve(SupplierLead): RegionResolution` со всем каскадом qc-решений, кэшем, ambiguous-логикой, persistent-idempotency, cache-hit логированием. Это сердце фичи.
**Preconditions:** Sessions 1-3. Все суб-компоненты существуют и зелёные.
**Files:**
- Create: `app/app/Services/LeadRegionResolver.php`, `app/app/Services/Dto/RegionResolution.php`
- Test: `app/tests/Unit/Services/LeadRegionResolverTest.php` (12 кейсов из спеки §9.1)
### Task 4.1 — RegionResolution DTO + source rank
- [ ] **Step 1: Падающий тест** на DTO: поля `subjectCode: ?int`, `actualSubjectCode: ?int`, `source: string` ('dadata'|'rossvyaz'|'tag'|'unknown'), `phoneOperator: ?string`, `qc: ?int`, `cacheHit: bool`, `dadataResponseMasked: ?array`, `durationMs: ?int`, `rossvyazMatched: bool`. + статик `SOURCE_RANK` const `['dadata'=>4,'rossvyaz'=>3,'tag'=>2,'unknown'=>1]`. + фабрики `fromTag()`, `fromSupplierLead()` (для persistent-idempotency).
- [ ] **Step 2-4:** реализация readonly DTO, PASS.
- [ ] **Step 5: Коммит** `feat(region): RegionResolution DTO + SOURCE_RANK`.
### Task 4.2 — LeadRegionResolver: 12 кейсов (TDD, по одному тесту за раз)
Реализация по алгоритму спеки §3.3 + §3.4 (decision-таблица). Кэш-ключ `sha256("phone-region:".$phone)`, TTL = `config('services.dadata.cache_ttl_days')` дней. Persistent-idempotency: в начале `resolve()` если `$lead->resolved_subject_code !== null || $lead->region_source !== null``RegionResolution::fromSupplierLead($lead)` без DaData. Валидация телефона `/^7\d{10}$/` (как в Job/Controller).
Каждый тест из списка спеки §9.1 — отдельный TDD-цикл (Step write→fail→implement→pass→commit). Имена тестов (Pest `it('...')`):
- [ ] `dadata qc 0 returns dadata source``Http::fake` qc=0 region не-ambiguous → source='dadata', subjectCode маппится.
- [ ] `dadata qc 0 ambiguous region falls to rossvyaz but keeps dadata provider` — region='Санкт-Петербург и область' → идём в Россвязь за subjectCode=83, provider остаётся от DaData (И-2). **Ключевой тест ambiguous-логики.**
- [ ] `dadata qc 3 returns dadata with multiple flag`.
- [ ] `dadata qc 1 falls back to rossvyaz`.
- [ ] `dadata qc 2 falls back to tag skipping rossvyaz`.
- [ ] `dadata qc 7 falls back to tag skipping rossvyaz`.
- [ ] `dadata timeout falls back to rossvyaz`.
- [ ] `dadata network error falls back to rossvyaz`.
- [ ] `budget cap exceeded skips dadata directly to rossvyaz` (`DaDataBudgetGuard::canSpend()` false).
- [ ] `cache hit skips dadata and rossvyaz` — второй вызов того же телефона не дёргает Http (assert `Http::assertSentCount`).
- [ ] `invalid phone skips dadata returns tag`.
- [ ] `qc 0 region null falls through to rossvyaz` (мобильный без региона, §3.4 Q6/Q7).
- [ ] `unmappable dadata region falls through to rossvyaz` (qc=0 но region не в справочнике).
- [ ] `all three layers fail returns unknown with null subject_code`.
После каждого — Step «commit» `feat(region): LeadRegionResolver — <case>` (или батч-коммит на 3-4 связанных кейса).
**Session 4 завершение:** `cd app && ./vendor/bin/pest tests/Unit/Services/LeadRegionResolverTest.php` все GREEN. Push. **Это самая важная сессия — не торопиться, ревью каждого кейса.**
---
## SESSION 5 — LeadRouter каскад + подмена региона
**Deliverable:** `LeadRouter::matchEligibleProjects` принимает `?int $resolvedSubjectCode`, фильтрует в 3 фазы (точное→РФ→запасной) для ОБОИХ путей (DIRECT + pivot), отдаёт ≤3 кандидата с атрибутом `routing_step`.
**Preconditions:** Sessions 1-4. **Решён вопрос D1** (random→deterministic подтверждён заказчиком).
**Files:**
- Modify: `app/app/Services/LeadRouter.php` (новый параметр + queryCandidates 3-фазы)
- Modify: `app/tests/Pest.php` (расширить `createRoutingSnapshotFromProject` параметром `string $regions = '{}'`)
- Test: `app/tests/Feature/Services/LeadRouterCascadeTest.php`
### Task 5.1 — Расширить тест-хелпер
- [ ] **Step 1:** В `createRoutingSnapshotFromProject` (Pest.php строки 128-150) добавить параметр `string $regions = '{}'` и подставить в insert вместо хардкода `'{}'` (строка 141). Существующие вызовы не ломаются (дефолт сохранён).
- [ ] **Step 2:** Прогнать существующий `LeadRouterTest.php` — GREEN (регресс не сломан).
- [ ] **Step 3: Коммит** `test(region): createRoutingSnapshotFromProject accepts regions param`.
### Task 5.2 — Каскад: сигнатура + 3 фазы (TDD)
> **Подход:** обернуть существующий SQL приватным `queryCandidates(string $activeDate, SupplierProject $sp, string $regionFilter, ?int $code, array $excludeTenantIds, int $limit): Collection`. Он содержит развилку DIRECT vs pivot (как сейчас) + добавляет WHERE-фрагмент по фильтру. `matchEligibleProjects(SupplierProject $sp, ?int $resolvedSubjectCode = null)` оркестрирует 3 фазы (§3.9 псевдокод), проставляет `routing_step` на каждый Project через `$project->setAttribute('routing_step', N)`.
WHERE-фрагменты:
- `exact`: `AND ?::int = ANY(snap.regions)` (bind `$code`)
- `all_ru`: `AND snap.regions = '{}'::int[]`
- `any`: без региона-фильтра (текущее поведение)
- [ ] **Step 1: Падающие тесты** `LeadRouterCascadeTest.php` (Pest, `DatabaseTransactions` + `SharesSupplierPdo`, tenant-context '0'):
```php
it('step 1: exact region match wins', function (): void {
$sp = SupplierProject::query()->create(['platform'=>'B1','signal_type'=>'site','unique_key'=>'ex.ru','subject_code'=>82,'current_limit'=>0,'sync_status'=>'ok']);
// tenant A — регион 83 (СПб); tenant B — регион 82 (Москва)
$a = makeLinkedProject($sp, regions: '{83}'); // helper inline
$b = makeLinkedProject($sp, regions: '{82}');
$matched = app(LeadRouter::class)->matchEligibleProjects($sp, resolvedSubjectCode: 82);
expect($matched->pluck('id')->all())->toBe([$b->id]) // только Москва-проект
->and($matched->first()->routing_step)->toBe(1);
});
it('step 2: falls to all-RF when no exact match', function (): void {
// кандидат только с regions='{}' → routing_step=2 для resolvedSubjectCode=82
});
it('step 3: fallback channel when nobody subscribed to region', function (): void {
// кандидат с regions='{83}' только; resolvedSubjectCode=82 → никто не подписан, нет РФ →
// возвращается с routing_step=3 (подмена в Job, не здесь)
});
it('exact + all-RF combine up to cap=3', function (): void { /* 2 точных + 2 РФ → 3 взяты, точные первыми */ });
it('null resolvedSubjectCode skips exact, uses all-RF then fallback', function (): void { /* резолвер не сработал */ });
it('cascade works for DIRECT supplier_project path too', function (): void { /* platform=DIRECT */ });
```
(`makeLinkedProject($sp, regions)` — inline-хелпер в файле теста: создаёт tenant с балансом, project, `linkProjectToSupplier`, `createRoutingSnapshotFromProject($p, regions: $regions)`.)
- [ ] **Step 2: FAIL.**
- [ ] **Step 3: Реализация** каскада. Сохранить fail-loud `logIfNoSnapshot` (вызывать на финальном результате). `excludeTenantIds` для шага 2 = tenant_id из шага 1.
- [ ] **Step 4: PASS** + регресс `LeadRouterTest.php` GREEN (старые вызовы без 2-го параметра используют дефолт `null` → ведут себя как «any», но теперь через каскад → проверить что 0-региональные тесты не сломались; при необходимости старые snapshot'ы имеют `regions='{}'` → попадают в шаг 2 all_ru).
> **⚠️ Регрессионный риск:** существующие `LeadRouterTest` создают snapshot с `regions='{}'` и вызывают `matchEligibleProjects($sp)` без 2-го арг. С каскадом `resolvedSubjectCode=null` → шаг 1 пропускается → шаг 2 all_ru матчит `regions='{}'` → те же результаты. **Проверить это явно**; если расходится — поправить дефолтную ветку, чтобы `null` + любой regions вёл себя как старое «any» (backward-compat). Это решение зафиксировать в коммит-сообщении.
- [ ] **Step 5: Коммит** `feat(region): LeadRouter cascade routing (exact→all-RF→fallback) with routing_step`.
**Session 5 завершение:** `cd app && ./vendor/bin/pest tests/Feature/Services/LeadRouterTest.php tests/Feature/Services/LeadRouterCascadeTest.php` GREEN. Push.
---
## SESSION 6 — Интеграция в Job + CSV-merge + flag + раскатка
**Deliverable:** `RouteSupplierLeadJob` использует `LeadRegionResolver`, персистит резолв, передаёт `routing_step`, подменяет регион на шаге 3; CSV-merge обновляет по рангу источника; feature-flag; метрики; staging-smoke.
**Preconditions:** Sessions 1-5 все зелёные и смержены.
**Files:**
- Modify: `app/app/Jobs/RouteSupplierLeadJob.php` (handle + createDealCopyForProject + CSV-merge)
- Create: `app/app/Console/Commands/PhoneRegionSmokeCommand.php` (staging-smoke §9.4)
- Test: `app/tests/Feature/Jobs/RouteSupplierLeadJobRegionResolutionTest.php`
### Task 6.1 — Резолв до транзакции + persist (TDD)
> **Точка вставки** ([RouteSupplierLeadJob.php:151-160](../../../app/app/Jobs/RouteSupplierLeadJob.php#L151)). Сейчас: `$matched = $router->matchEligibleProjects($supplier); $selected = $distributor->selectRecipients($matched); $subjectCode = $tagResolver->resolve(...)`. Становится: резолв региона ДО `matchEligibleProjects`, persist в одной короткой `DB::transaction()`, затем `matchEligibleProjects($supplier, $resolution->subjectCode)`.
- [ ] **Step 1: Падающий тест** `RouteSupplierLeadJobRegionResolutionTest.php`:
```php
it('lead with phone uses dadata region not tag', function (): void {
Http::fake(['cleaner.dadata.ru/*' => Http::response([['qc'=>0,'type'=>'Мобильный','provider'=>'МТС','region'=>'Москва']], 200)]);
// lead с raw_payload tag='Санкт-Петербург' но phone резолвится в Москву(82)
// → deal.subject_code = 82, supplier_leads.resolved_subject_code=82, region_source='dadata'
// → строка в lead_region_resolution_log
});
it('region resolution logged per lead with cache_hit flag', function (): void { /* 1 строка в log */ });
it('lead with invalid phone falls back to tag', function (): void { /* phone='123' → region_source='tag' */ });
it('lead with resolver disabled via flag uses tag', function (): void { /* config dadata.enabled=false → tag-flow */ });
it('persistent idempotency: retry does not re-call dadata', function (): void { /* resolved_subject_code уже set → Http::assertNothingSent */ });
```
- [ ] **Step 2: FAIL.**
- [ ] **Step 3: Реализация.** Инжектить `LeadRegionResolver $regionResolver` в `handle()`. После `$lead->update(['supplier_project_id'...])`:
```php
$resolution = $regionResolver->resolve($lead);
// persist в одной короткой транзакции (ДО циклов по проектам — HTTP не висит в tenant-tx)
DB::transaction(function () use ($lead, $resolution): void {
$lead->update([
'resolved_subject_code' => $resolution->subjectCode,
'region_source' => $resolution->source,
'dadata_qc' => $resolution->qc,
'phone_operator' => $resolution->phoneOperator,
]);
$this->logRegionResolution($lead, $resolution); // INSERT lead_region_resolution_log
});
$matched = $router->matchEligibleProjects($supplier, $resolution->subjectCode);
$selected = $distributor->selectRecipients($matched);
```
Удалить старый `$subjectCode = $tagResolver->resolve(...)`. `RegionTagResolver` остаётся injected (его использует `LeadRegionResolver` как fallback — DI цепочка). Приватный `logRegionResolution()` пишет в `lead_region_resolution_log` через `pgsql_supplier`, телефон маскируется (§7.1: `7XXX***YYYY`).
- [ ] **Step 4: PASS.**
- [ ] **Step 5: Коммит** `feat(region): wire LeadRegionResolver into RouteSupplierLeadJob + persist`.
### Task 6.2 — Подмена subject_code на шаге 3 (TDD)
- [ ] **Step 1: Падающий тест**`routing_step=3` проект получает deal с `subject_code` = первый из `project->regions`, `region_substituted=true`; `lead_region_resolution_log.actual_subject_code` = настоящий резолв. `routing_step<3` → настоящий subjectCode, `region_substituted=false`.
- [ ] **Step 2: FAIL.**
- [ ] **Step 3: Реализация** §3.10. `createDealCopyForProject` получает `RegionResolution $resolution` (вместо `?int $subjectCode`). Внутри:
```php
$dealSubjectCode = ($project->routing_step ?? 1) < 3
? $resolution->subjectCode
: $this->pickSubstituteRegion($project, $resolution->subjectCode);
$dealRegionSubstituted = ($project->routing_step ?? 1) === 3;
// Deal::create([... 'subject_code'=>$dealSubjectCode, 'phone_operator'=>$resolution->phoneOperator, 'region_substituted'=>$dealRegionSubstituted])
```
`pickSubstituteRegion(Project $p, ?int $resolved): ?int` — пустой `$p->regions``$resolved`; иначе `$p->regions[0]`. Дописать `lead_region_resolution_log` UPDATE с `routing_step`/`actual_subject_code`/`substituted_subject_code` (или включить в Task 6.1 лог — решить при сборке, лог пишется ПОСЛЕ маршрутизации когда routing_step известен; возможно перенести запись лога из 6.1 в конец handle()).
> **NB порядок записи лога:** `routing_step` известен только ПОСЛЕ `matchEligibleProjects`. Значит INSERT в `lead_region_resolution_log` логичнее делать ПОСЛЕ цикла (с агрегатом routing_step) ИЛИ писать базовую строку в 6.1 и UPDATE'ить routing-поля после. Выбрать: **одна строка на лид** пишется в конце `handle()` с финальными routing-полями (subject_code лида один, routing_step берётся от первого selected-проекта или max). Зафиксировать решение в коммите.
- [ ] **Step 4: PASS.**
- [ ] **Step 5: Коммит** `feat(region): step-3 fallback subject_code substitution + region_substituted`.
### Task 6.3 — CSV-merge update по рангу источника (TDD)
- [ ] **Step 1: Падающий тест** — CSV-recovered deal `region_source='tag'`, subject_code=99; webhook даёт `dadata` subject=82 → merge обновляет subject_code/phone_operator/region_source (rank 4>2). Равный/худший ранг → НЕ обновляет.
- [ ] **Step 2: FAIL.**
- [ ] **Step 3: Реализация** §3.12 в merge-блоке (строки 340-369). При наличии `$existingMergeable` и нового `$resolution`: сравнить `RegionResolution::SOURCE_RANK`, если новый выше — добавить `subject_code`/`phone_operator`/`region_source` в `DB::table('deals')->where('id')->where('received_at')->update([...])`. **Сохранить `received_at` в WHERE** (partition pruning + FK, как в существующем коде, строки 357-360).
- [ ] **Step 4: PASS.**
- [ ] **Step 5: Коммит** `feat(region): CSV-merge updates subject_code/operator by source rank`.
### Task 6.4 — Staging-smoke команда + метрики
- [ ] **Step 1:** `PhoneRegionSmokeCommand` (`phone-region:smoke --phone=...`) §9.4 — дёргает живой DaData+Россвязь, печатает решение, НЕ пишет в БД. Тест: команда с `Http::fake` печатает структуру.
- [ ] **Step 2:** Метрики §8.1 — инкременты `phone_resolution.source.*` / `dadata.qc.*` / `cache.{hit,miss}` через существующий механизм метрик проекта (проверить как проект шлёт в Sentry/Prometheus — grep `metric`/`Sentry::` в `app/app/Services`). Если механизма нет — отложить в отдельную задачу, отметить в коммите.
- [ ] **Step 3: Коммит** `feat(region): staging smoke command + resolution metrics`.
### Task 6.5 — Регрессия + handoff раскатки
- [ ] **Step 1:** Полная регрессия затронутого слоя: `cd app && ./vendor/bin/pest tests/Unit/Services tests/Feature/Services tests/Feature/Jobs tests/Feature/Migrations`. GREEN.
- [ ] **Step 2:** `superpowers:requesting-code-review` на весь диапазон фичи.
- [ ] **Step 3:** Документ-handoff раскатки (§10): порядок прод-шагов (миграция → импорт реестра → деплой с `LEAD_REGION_RESOLVER_ENABLED=false` → 1% → 100%), включая `DADATA_API_KEY`/`DADATA_SECRET` в YC Lockbox. Файл: `docs/superpowers/runbooks/2026-05-31-lead-region-resolution-rollout.md`.
- [ ] **Step 4: Финальный коммит + PR.** `superpowers:finishing-a-development-branch`.
**Session 6 завершение:** вся фича зелёная, code-review пройден, runbook готов. Фактический первый импорт реестра Россвязи + раскатка — оператором по runbook, ВНЕ этого плана.
---
## Self-Review (выполнено автором плана)
**Spec coverage:** §3.3 резолвер→Session 4; §3.4/§3.4.1 qc+ambiguous→Session 4; §3.7 Россвязь→Session 2; §3.6 DaData→Session 3; §3.9 каскад→Session 5; §3.10 подмена→Session 6.2; §3.11 persist/idempotency→Session 6.1; §3.12 CSV-merge→Session 6.3; §3.13 rate-limit→Session 3.4; §4 схема→Session 1; §5 config→Session 3.1; §6 импорт→Session 2.2; §8 метрики→Session 6.4; §9 тесты→распределены; §11 бюджет→config+guard Session 3. **Gap:** §7 (152-ФЗ маскирование) — покрыто частично (phone_masked в логе, Session 6.1); pg_anonymizer-маски (§7.2) НЕ выделены в задачу → **добавить в Session 1 Task 1.3 как комментарий схемы ИЛИ отдельную задачу раскатки** (low-risk, отметить для заказчика).
**Type consistency:** `RegionResolution` поля (`subjectCode`/`source`/`phoneOperator`/`qc`/`actualSubjectCode`) согласованы между Session 4 (определение), Session 5 (роутер не зависит от DTO), Session 6 (потребитель). `routing_step` — атрибут на `Project` (Session 5 пишет, Session 6 читает). `SOURCE_RANK` — один источник в `RegionResolution` (Session 4), потребляется в Session 6.3.
**Placeholders:** DDL, сигнатуры, имена тестов, точка интеграции — конкретны. Полные TDD-шаги для рутинных тестов внутри Session 4/6 описаны именами кейсов + поведением; при subagent-driven-development каждый кейс разворачивается исполнителем в write→fail→implement→pass (имена и ожидаемое поведение заданы точно).
---
## Порядок выполнения и ветки
1. Все 6 сессий — на одной ветке `feat/lead-region-resolution`, последовательно.
2. Каждая сессия = отдельный subagent-driven-development прогон с ревью между задачами (Pravila §15.1 — субагенты git только Sonnet/Opus, верификация commit-базы после каждого).
3. Между сессиями — пауза/чекпойнт заказчику (можно разнести по календарным дням).
4. Изоляция от параллельных сессий: если router-gate v4 streams ещё активны — работать в worktree (`superpowers:using-git-worktrees`), мерж в main отдельным чекпойнтом.
@@ -0,0 +1,459 @@
# Safe-baseline live wiring Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Make `enforce-safe-baseline-metering.mjs` a live PreToolUse hook that hard-blocks a mutating tool past a per-task safe-baseline threshold without a real skill match, with an always-available Skill/EnterPlanMode escape; plus a standalone `enforce-runtime-write-deny` hook that closes the self-write hole on `~/.claude/runtime` side-channels.
**Architecture:** All logic in pure functions; `main()` is I/O composition only. The pure metering core (`safe-baseline-metering.mjs`) is reused unchanged; new pure helpers (`extractKeywords`, `detectSkillMatch`, `runLiveDecision`) live in the wrapper. The stickiness contract (V2-1) is owned by `runLiveDecision`. The write-deny hook normalizes with the resolving `pathNormalize` (V2-2). Override subsystem is cut (G3).
**Tech Stack:** Node.js ESM (`.mjs`), vitest, existing helpers (`enforce-hook-helpers.mjs`, `safe-baseline-metering.mjs`, `path-normalization.mjs`).
**Spec:** `docs/superpowers/specs/2026-05-30-safe-baseline-live-wiring-design.md` (v4).
**NB (overnight autonomous run):** git commits require owner AskUserQuestion approval (gate) — not available while the owner sleeps. Implement on disk, keep `npm run test:tools` GREEN, leave commits + settings.json registration for the morning handoff.
---
## File Structure
| Path | Responsibility |
|---|---|
| `tools/enforce-safe-baseline-metering.mjs` (modify) | + `extractKeywords`, `detectSkillMatch`, `runLiveDecision`, live `main()` |
| `tools/enforce-safe-baseline-metering.test.mjs` (modify) | + tests for the three new pure functions |
| `tools/enforce-runtime-write-deny.mjs` (create) | standalone PreToolUse write-deny on `~/.claude/runtime/**` |
| `tools/enforce-runtime-write-deny.test.mjs` (create) | unit tests incl. V2-2 `.`-segment evasion |
---
### Task 1: `extractKeywords(promptText)` (pure)
**Files:** Modify `tools/enforce-safe-baseline-metering.mjs`; Test `tools/enforce-safe-baseline-metering.test.mjs`
- [ ] **Step 1: Write the failing test**
```js
import { extractKeywords } from './enforce-safe-baseline-metering.mjs';
describe('extractKeywords', () => {
it('lowercases, drops <4-char tokens and stopwords, returns unique sorted', () => {
expect(extractKeywords('Почини safe-baseline router gate')).toEqual(['baseline', 'gate', 'router', 'safe']);
});
it('drops common RU imperatives so unrelated tasks do not falsely overlap', () => {
const a = extractKeywords('сделай проверь биллинг тариф');
const b = extractKeywords('сделай проверь регион маршрут');
const overlap = a.filter((k) => b.includes(k));
expect(overlap).toEqual([]); // only the topic words survive, no shared imperatives
});
it('returns [] for empty/non-string', () => {
expect(extractKeywords('')).toEqual([]);
expect(extractKeywords(null)).toEqual([]);
});
});
```
- [ ] **Step 2: Run test to verify it fails**`npx vitest run tools/enforce-safe-baseline-metering.test.mjs` → FAIL (extractKeywords not exported).
- [ ] **Step 3: Write minimal implementation**
```js
const STOPWORDS = new Set([
// RU common + imperatives
'сделай', 'сделать', 'проверь', 'проверить', 'посмотри', 'добавь', 'добавить',
'напиши', 'написать', 'нужно', 'надо', 'давай', 'можешь', 'потом', 'после',
'перед', 'через', 'очень', 'если', 'чтобы', 'этот', 'эта', 'это', 'эти',
'или', 'тоже', 'также', 'когда', 'пока', 'весь', 'всё', 'все', 'теперь',
'здесь', 'там', 'нет', 'есть', 'будет', 'было', 'твой', 'мой', 'самый',
// EN common + imperatives
'then', 'this', 'that', 'with', 'from', 'your', 'please', 'just', 'make',
'check', 'look', 'need', 'want', 'also', 'into', 'more', 'very', 'should',
'will', 'have', 'does', 'done', 'them', 'they', 'here', 'there',
]);
export function extractKeywords(promptText) {
if (typeof promptText !== 'string') return [];
const tokens = promptText
.toLowerCase()
.split(/[^\p{L}\p{N}]+/u)
.filter((t) => t.length >= 4 && !STOPWORDS.has(t));
return [...new Set(tokens)].sort();
}
```
- [ ] **Step 4: Run test to verify it passes** — expected PASS.
- [ ] **Step 5: Commit**`git add tools/enforce-safe-baseline-metering.mjs tools/enforce-safe-baseline-metering.test.mjs` / `git commit -m "feat(safe-baseline): extractKeywords pure tokenizer (H1)"` *(defer overnight)*
---
### Task 2: `detectSkillMatch(turnEntries)` (pure)
**Files:** Modify both as above.
- [ ] **Step 1: Write the failing test**
```js
import { detectSkillMatch } from './enforce-safe-baseline-metering.mjs';
function asstToolUse(name, input = {}) {
return { message: { role: 'assistant', content: [{ type: 'tool_use', name, input }] } };
}
describe('detectSkillMatch', () => {
it('true when the turn has a Skill tool_use', () => {
expect(detectSkillMatch([asstToolUse('Skill', { skill: 'superpowers:brainstorming' })])).toBe(true);
});
it('true when the turn has an EnterPlanMode tool_use', () => {
expect(detectSkillMatch([asstToolUse('EnterPlanMode')])).toBe(true);
});
it('false for Read/Grep/text-only turns (no self-grant via text)', () => {
expect(detectSkillMatch([asstToolUse('Read', { file_path: 'docs/superpowers/plans/x.md' })])).toBe(false);
expect(detectSkillMatch([{ message: { role: 'assistant', content: [{ type: 'text', text: 'docs/superpowers/plans/x.md' }] } }])).toBe(false);
});
it('false for empty/non-array', () => {
expect(detectSkillMatch([])).toBe(false);
expect(detectSkillMatch(null)).toBe(false);
});
});
```
- [ ] **Step 2: Run to verify FAIL** (detectSkillMatch not exported).
- [ ] **Step 3: Write minimal implementation**
```js
const SKILL_MATCH_TOOLS = new Set(['Skill', 'EnterPlanMode']);
export function detectSkillMatch(turnEntries) {
if (!Array.isArray(turnEntries)) return false;
for (const e of turnEntries) {
const c = e && e.message && e.message.content;
if (!Array.isArray(c)) continue;
for (const b of c) {
if (b && b.type === 'tool_use' && SKILL_MATCH_TOOLS.has(b.name)) return true;
}
}
return false;
}
```
- [ ] **Step 4: Run to verify PASS.**
- [ ] **Step 5: Commit** *(defer overnight)*.
---
### Task 3: `runLiveDecision(...)` (pure — V2-1 stickiness contract)
**Files:** Modify both as above.
- [ ] **Step 1: Write the failing test** — cover BOTH V2-1 failure modes.
```js
import { runLiveDecision } from './enforce-safe-baseline-metering.mjs';
import { newCounterState } from './safe-baseline-metering.mjs';
function ledgerWith(counts, skill, keywords) {
return {
state: { ...newCounterState({ taskId: 't', startedAtIso: '2026-05-30T00:00:00Z', firstPromptExcerpt: 'p' }),
counts: { Read: 0, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0, ...counts },
skill_match_within_task: skill },
lastKeywords: keywords,
};
}
describe('runLiveDecision — stickiness contract (V2-1)', () => {
it('persists skillMatchedThisTurn into the ledger (stickiness not lost)', () => {
const r = runLiveDecision({
event: { tool_name: 'Read' }, priorLedger: null,
promptText: 'router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
skillMatchedThisTurn: true,
});
expect(r.ledger.state.skill_match_within_task).toBe(true);
});
it('a skill earlier in a task keeps later mutating ops allowed past the hard limit (no false block)', () => {
const prior = ledgerWith({ Read: 60 }, true, ['router', 'gate', 'safe', 'baseline']);
const r = runLiveDecision({
event: { tool_name: 'Edit' }, priorLedger: prior,
promptText: 'продолжаем router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
skillMatchedThisTurn: false,
});
expect(r.action).toBe('allow');
});
it('skill match in task A does NOT exempt an unrelated task B (no cross-task leak)', () => {
const prior = ledgerWith({ Read: 60 }, true, ['router', 'gate', 'safe', 'baseline']);
const r = runLiveDecision({
event: { tool_name: 'Edit' }, priorLedger: prior,
promptText: 'другая тема регион маршрут лиды', currentKeywords: ['регион', 'маршрут', 'лиды'],
skillMatchedThisTurn: false,
});
// fresh task (overlap < 2) → counters reset to 0 → Edit allowed BUT skill_match must be false now
expect(r.ledger.state.skill_match_within_task).toBe(false);
expect(r.ledger.state.counts.Read).toBe(0);
});
it('hard-blocks a mutating tool past the limit in a no-skill task', () => {
const prior = ledgerWith({ Read: 60 }, false, ['router', 'gate', 'safe', 'baseline']);
const r = runLiveDecision({
event: { tool_name: 'Edit' }, priorLedger: prior,
promptText: 'router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
skillMatchedThisTurn: false,
});
expect(r.action).toBe('hard_block');
});
});
```
- [ ] **Step 2: Run to verify FAIL.**
- [ ] **Step 3: Write minimal implementation**
```js
import { shouldInheritTaskId } from './safe-baseline-metering.mjs';
export function runLiveDecision({ event, priorLedger, promptText, currentKeywords, skillMatchedThisTurn, thresholds }) {
const inherit = !!(priorLedger && priorLedger.state &&
shouldInheritTaskId(priorLedger.lastKeywords || [], currentKeywords, promptText));
const priorSticky = inherit ? !!priorLedger.state.skill_match_within_task : false;
const effectiveSkillMatched = priorSticky || !!skillMatchedThisTurn;
const res = processEvent({
event, priorLedger, currentKeywords, promptText,
skillMatched: effectiveSkillMatched, thresholds,
});
// V2-1: persist stickiness — processEvent does not.
res.ledger.state.skill_match_within_task = effectiveSkillMatched;
return res;
}
```
- [ ] **Step 4: Run to verify PASS.**
- [ ] **Step 5: Commit** *(defer overnight)*.
---
### Task 4: Live `main()` wiring + integration test
**Files:** Modify both as above.
- [ ] **Step 1: Write the failing integration test** (injected runtimeDir + transcript fixture)
```js
import { runMain } from './enforce-safe-baseline-metering.mjs';
import { mkdtempSync, writeFileSync, readFileSync, existsSync } from 'node:fs';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
function fixtureTranscript(path, entries) { writeFileSync(path, entries.map((e) => JSON.stringify(e)).join('\n')); }
describe('safe-baseline live main (runMain)', () => {
it('blocks an Edit when Read past hard with no skill, and the message names the escape', async () => {
const dir = mkdtempSync(join(tmpdir(), 'sbm-'));
const tpath = join(dir, 't.jsonl');
// prior ledger: Read=60, no skill, same task keywords
writeFileSync(join(dir, 'safe-baseline-ledger-S.json'), JSON.stringify({
state: { schema_version: 1, task_id: 't', counts: { Read: 60, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 }, skill_match_within_task: false },
lastKeywords: ['router', 'gate', 'safe', 'baseline'],
}));
fixtureTranscript(tpath, [{ type: 'user', message: { role: 'user', content: 'router gate safe baseline' } }]);
const res = await runMain({
event: { tool_name: 'Edit', session_id: 'S', transcript_path: tpath },
runtimeDir: dir,
});
expect(res.block).toBe(true);
expect(res.message).toMatch(/EnterPlanMode|Skill/);
});
it('allows a fresh task and persists the ledger', async () => {
const dir = mkdtempSync(join(tmpdir(), 'sbm-'));
const tpath = join(dir, 't.jsonl');
fixtureTranscript(tpath, [{ type: 'user', message: { role: 'user', content: 'новая тема регион' } }]);
const res = await runMain({
event: { tool_name: 'Read', session_id: 'S2', transcript_path: tpath },
runtimeDir: dir,
});
expect(res.block).toBe(false);
expect(existsSync(join(dir, 'safe-baseline-ledger-S2.json'))).toBe(true);
});
});
```
- [ ] **Step 2: Run to verify FAIL** (runMain not exported).
- [ ] **Step 3: Write minimal implementation** — replace the no-op `main()` with a testable `runMain` + thin `main()`.
```js
import { readFileSync as _rf, writeFileSync as _wf, appendFileSync as _af, mkdirSync as _mk } from 'node:fs';
import { join as _join } from 'node:path';
import { homedir as _home } from 'node:os';
import { readStdin, parseEventJson, readTranscript, lastUserPromptText, lastTurnEntries, exitDecision } from './enforce-hook-helpers.mjs';
const ESCAPE_MSG = 'invoke the recommended Skill, or EnterPlanMode, to proceed (skill/plan invocations are never blocked by this layer).';
function rtDir(o) { return o || _join(_home(), '.claude', 'runtime'); }
function loadLedger(dir, sess) {
try { return JSON.parse(_rf(_join(dir, `safe-baseline-ledger-${sess || 'unknown'}.json`), 'utf8')); }
catch { return null; }
}
function saveLedger(dir, sess, ledger) {
try { _mk(dir, { recursive: true }); _wf(_join(dir, `safe-baseline-ledger-${sess || 'unknown'}.json`), JSON.stringify(ledger)); }
catch { /* fail-quiet */ }
}
function logFlag(dir, sess, entry) {
try { _mk(dir, { recursive: true }); _af(_join(dir, `safe-baseline-flags-${sess || 'unknown'}.jsonl`), JSON.stringify({ ts: new Date().toISOString(), ...entry }) + '\n'); }
catch { /* ignore */ }
}
export async function runMain({ event, runtimeDir, transcript: injectedTranscript } = {}) {
try {
const sess = event.session_id;
const dir = rtDir(runtimeDir);
const transcript = injectedTranscript || readTranscript(event.transcript_path);
const promptText = lastUserPromptText(transcript) || '';
const currentKeywords = extractKeywords(promptText);
const skillMatchedThisTurn = detectSkillMatch(lastTurnEntries(transcript)) ||
['Skill', 'EnterPlanMode'].includes(event.tool_name);
const priorLedger = loadLedger(dir, sess);
const res = runLiveDecision({ event, priorLedger, promptText, currentKeywords, skillMatchedThisTurn });
saveLedger(dir, sess, res.ledger);
if (res.action === 'soft_flag') logFlag(dir, sess, { tool: event.tool_name, reason: res.reason });
if (res.action === 'hard_block') return { block: true, message: `[safe-baseline] ${res.reason}\n${ESCAPE_MSG}` };
return { block: false };
} catch {
return { block: false }; // fail-quiet
}
}
async function main() {
const event = parseEventJson(await readStdin());
const res = await runMain({ event });
exitDecision(res);
}
if ((process.argv[1] || '').replace(/\\/g, '/').endsWith('/enforce-safe-baseline-metering.mjs')) {
main().catch(() => process.exit(0));
}
```
(Remove the old no-op `main()` and its CLI guard.)
- [ ] **Step 4: Run to verify PASS** + `npm run test:tools` GREEN.
- [ ] **Step 5: Commit** *(defer overnight)*.
---
### Task 5: `enforce-runtime-write-deny.mjs` (standalone, V2-2)
**Files:** Create `tools/enforce-runtime-write-deny.mjs` + `tools/enforce-runtime-write-deny.test.mjs`.
- [ ] **Step 1: Write the failing test**
```js
import { decide } from './enforce-runtime-write-deny.mjs';
import { homedir } from 'node:os';
import { join } from 'node:path';
const HOME = homedir();
describe('enforce-runtime-write-deny decide()', () => {
it('blocks a Write into ~/.claude/runtime', () => {
const r = decide({ toolName: 'Write', filePath: join(HOME, '.claude', 'runtime', 'askuser-decisions-S.jsonl') });
expect(r.block).toBe(true);
});
it('blocks the .-segment evasion (V2-2)', () => {
const r = decide({ toolName: 'Write', filePath: join(HOME, '.claude', '.', 'runtime', 'x.jsonl') });
expect(r.block).toBe(true);
});
it('allows a Write to a normal project path', () => {
const r = decide({ toolName: 'Write', filePath: join(HOME, 'project', 'src', 'x.mjs') });
expect(r.block).toBe(false);
});
it('ignores non-write tools', () => {
expect(decide({ toolName: 'Read', filePath: join(HOME, '.claude', 'runtime', 'x') }).block).toBe(false);
});
});
```
- [ ] **Step 2: Run to verify FAIL.**
- [ ] **Step 3: Write minimal implementation**
```js
#!/usr/bin/env node
/**
* enforce-runtime-write-deny — PreToolUse(Edit|Write|MultiEdit|NotebookEdit).
* Blocks the Write/Edit TOOL from writing under ~/.claude/runtime/** (closes a
* pre-existing self-write hole on the v4 git-approval anchor). Standalone —
* independent of safe-baseline. Uses the resolving pathNormalize (V2-2) so
* `.`/`..` segments cannot evade the match. Fail-OPEN on inability to determine
* the path (never bricks the session); blocks only on a confirmed runtime match.
*/
import { pathNormalize } from './path-normalization.mjs';
import { readStdin, parseEventJson, exitDecision } from './enforce-hook-helpers.mjs';
const WRITE_TOOLS = new Set(['Edit', 'Write', 'MultiEdit', 'NotebookEdit']);
const RUNTIME_RE = /(^|\/)\.claude\/runtime(\/|$)/i;
export function decide({ toolName, filePath, normalizeImpl = pathNormalize }) {
if (!WRITE_TOOLS.has(toolName)) return { block: false };
const fp = String(filePath || '');
if (!fp) return { block: false };
let norm;
try { norm = normalizeImpl(fp); } catch { return { block: false }; } // can't determine → fail-open (no brick)
if (RUNTIME_RE.test(norm)) {
return { block: true, reason: `Write to «${norm}» denied — ~/.claude/runtime is a protected side-channel (git-approval anchor).` };
}
return { block: false };
}
async function main() {
try {
const event = parseEventJson(await readStdin());
const r = decide({
toolName: event.tool_name,
filePath: (event.tool_input && (event.tool_input.file_path || event.tool_input.notebook_path)) || '',
});
exitDecision({ block: r.block, message: r.reason });
} catch {
exitDecision({ block: false }); // fail-quiet
}
}
const isCli = process.argv[1] && process.argv[1].replace(/\\/g, '/').endsWith('/enforce-runtime-write-deny.mjs');
if (isCli) main();
```
- [ ] **Step 4: Run to verify PASS** + `npm run test:tools` GREEN.
- [ ] **Step 5: Commit** *(defer overnight)*.
---
### Task 6: Full regression + handoff
- [ ] **Step 1:** `npm run test:tools` — confirm full GREEN count (baseline 1859 + new tests).
- [ ] **Step 2:** Write the morning handoff note (`docs/observer/notes/2026-05-30-safe-baseline-overnight.md`): queued commits, exact `.claude/settings.json` registration block, the fail-OPEN deviation note for owner review, and the "flip to enforce" status (already enforce per owner; observe-mode was not requested).
- [ ] **Step 3:** Commit everything in a batch with owner approval *(morning)*.
---
## Registration block (owner-applied, morning)
Add to `.claude/settings.json` `hooks.PreToolUse` (Claude cannot edit settings.json — gate-blocked):
```json
{ "matcher": "Read|Grep|Glob|LS|TodoWrite|AskUserQuestion|Edit|Write|MultiEdit|NotebookEdit|Bash|Skill|Task|EnterPlanMode",
"hooks": [{ "type": "command", "command": "node tools/enforce-safe-baseline-metering.mjs", "timeout": 10 }] }
```
```json
{ "matcher": "Edit|Write|MultiEdit|NotebookEdit",
"hooks": [{ "type": "command", "command": "node tools/enforce-runtime-write-deny.mjs", "timeout": 5 }] }
```
Until registered, both hooks are inert (no behavior change).
@@ -0,0 +1,98 @@
# Calibration 5 — cosmetic-detector excludes git-approval AskUser Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:test-driven-development. Steps use checkbox (`- [ ]`) syntax.
**Goal:** Stop `askuser-cosmetic-detector` from counting/blocking git-operation **approval** AskUsers as "cosmetic A/B" — a scope fix that does NOT lower discipline.
**Architecture:** The detector's target is *simple A/B questions used instead of structured ideation* (brainstorming/writing-plans). A git-approval AskUser (one whose option label is a literal git command) is the *sanctioned git-approval channel* (`enforce-askuser-answer-parser` writes `approve_git_operation` from it) — it is never a substitute for ideation. We add `isGitApprovalQuestion()` and short-circuit `decide()` to `allow` with `isSimpleAB:false` for such questions, so they are neither counted toward the session limit nor hard-blocked. Everything else is unchanged.
**Tech Stack:** Node ESM `.mjs`, vitest.
**Why NO discipline hole (adversarial check):**
- The real target (design-clarification A/B like "Вариант A"/"Вариант B") has NON-git labels → still classified simple → still counted → still hard-blocked at >2. Unchanged.
- A git-approval question is identified ONLY by an option label matching a git-command verb. To "disguise" a cosmetic clarification as exempt, the controller would have to put a literal `git …` command as an option label — but then the chosen answer IS a git command, which `enforce-askuser-answer-parser` turns into a real `approve_git_operation` record; it cannot function as a cosmetic ideation-dodge. So there is no usable bypass.
- Exemption is narrow and structural (label is a git command), mirroring calibrations 1 (Skill) / 3 (test-runner) / 4 (user-prompt fallback): scope fix, not a discipline drop.
---
## Task 1: isGitApprovalQuestion + decide() exemption
**Files:**
- Modify: `tools/askuser-cosmetic-detector.mjs`
- Test: `tools/askuser-cosmetic-detector.test.mjs`
- [ ] **Step 1: Write failing tests**
```javascript
import { isGitApprovalQuestion } from './askuser-cosmetic-detector.mjs';
describe('isGitApprovalQuestion (calibration 5)', () => {
it('true when an option label is a git command', () => {
expect(isGitApprovalQuestion([{ options: [{ label: 'git push origin main' }, { label: 'Не пушить' }] }])).toBe(true);
expect(isGitApprovalQuestion([{ options: [{ label: 'git commit -F x -- a b' }, { label: 'Отмена' }] }])).toBe(true);
});
it('false for a non-git A/B', () => {
expect(isGitApprovalQuestion([{ options: [{ label: 'Вариант А' }, { label: 'Вариант Б' }] }])).toBe(false);
});
});
// decide(): git-approval question is exempt — allow, not simple, not counted, never blocked even past the session limit.
describe('decide — git-approval exemption (calibration 5)', () => {
it('allows a git-approval question and does NOT count it even when session is already over the limit', () => {
const r = decide({
questions: [{ options: [{ label: 'git push origin main' }, { label: 'Не пушить' }] }],
simpleCountSession: 5, brainstormingInvoked: false,
});
expect(r.block).toBe(false);
expect(r.action).toBe('allow');
expect(r.isSimpleAB).toBe(false);
expect(r.newSessionCount).toBe(5); // unchanged — not counted
});
it('REGRESSION: a non-git simple A/B past the limit STILL hard-blocks (discipline intact)', () => {
const r = decide({
questions: [{ options: [{ label: 'A' }, { label: 'B' }] }],
simpleCountSession: 5, brainstormingInvoked: false,
});
expect(r.block).toBe(true);
expect(r.action).toBe('hard_block');
});
});
```
- [ ] **Step 2: Run RED**`npx vitest run --root app --config vitest.config.tools.mjs askuser-cosmetic-detector` → fail (isGitApprovalQuestion missing; git-approval not exempt).
- [ ] **Step 3: Implement**
Add near `isSimpleAB`:
```javascript
const GIT_CMD_RE = /\bgit\s+(?:commit|push|add|pull|merge|rebase|reset|checkout|switch|branch|stash|cherry-pick|revert|clean|restore|fetch|tag)\b/i;
/** True if this AskUser is a git-operation approval prompt (an option label is a git command). */
export function isGitApprovalQuestion(questions) {
if (!Array.isArray(questions)) return false;
return questions.some((q) =>
q && Array.isArray(q.options) &&
q.options.some((o) => o && typeof o.label === 'string' && GIT_CMD_RE.test(o.label)));
}
```
In `decide()`, replace `const simple = isSimpleAB(questions);` with:
```javascript
// Calibration 5: git-operation approval prompts are the sanctioned approval
// channel, never cosmetic ideation — exempt from the simple-AB count/block.
if (isGitApprovalQuestion(questions)) {
return { action: 'allow', block: false, reason: null, isSimpleAB: false, newSessionCount: simpleCountSession, newTurnCount: simpleCountTurn };
}
const simple = isSimpleAB(questions);
```
- [ ] **Step 4: Run GREEN** — same command → pass.
- [ ] **Step 5: Full regression**`npx vitest run --root app --config vitest.config.tools.mjs` → all green.
- [ ] **Step 6: Commit** (with git-approval).
@@ -0,0 +1,144 @@
# Discipline-guard backlog — router-gate `tools/enforce-*.mjs`
**Worktree:** `.claude/worktrees/discipline-guard` (branch `worktree-discipline-guard`).
**Date:** 2026-05-31. Owner-authorized backlog after quirk-2 + 1A closure (commit `b0cd18d7`).
## Context (already done — do NOT redo)
- **Quirk 2** — redirect detector is quote-aware (`stripQuotedSpans` in `tools/enforce-router-gate.mjs`): `>`/`2>` inside quotes no longer false-blocks. Commit `b0cd18d7`.
- **1A** — removed advertising of dead override phrases (`findOverride` is a v4 stub) from `enforce-prompt-injection` + verify-before-push / coverage-verify / memory-coverage / tdd-gate. Locked by negative tests. Same commit.
- Marketing MCP servers cut from `.mcp.json` (commit `63100dec`).
## Deliberately NOT doing (these are defense lines, not bugs)
- Calibration 6 of the judge (reading chat context) — weakens in-session defense.
- Quirk 3 (loosen exact-match of git approval) — that exact-match is an anti-injection property.
## Backlog (by priority)
### A. `npm ci` in router-gate whitelist (`SAFE_EXACT` in `tools/enforce-router-gate.mjs`) ← current
Restoring locked dependencies is safe and closes worktree-setup friction. `npm ci` installs
exactly the committed lockfile (deterministic, no version drift) — unlike `npm install`/`npm i`,
which stay hard-blacklisted because they can pull new/updated versions.
**TDD:**
1. RED — new describe block in `tools/enforce-router-gate.test.mjs`: allow `npm ci`,
`npm ci --no-audit`, `npm ci --prefer-offline`; still block `npm install`/`npm i`/
`npm install foo`/`npm i foo` (hard-blacklist), `npm cider` (word boundary → default-deny),
`npm ci && rm x` (chain mutating).
2. GREEN — add `/^npm\s+ci\b/` to `SAFE_EXACT` with rationale comment. `\b` prevents
`npm cider`-style prefix matches. Blacklist runs before whitelist, so `npm install`/`npm i`
stay blocked (the `i`-alternative needs `i` right after the space; `npm ci` has `c` there).
3. tools-vitest full run (also the push sentinel).
4. Commit via AskUserQuestion (label = exact command).
### B. Cosmetic path strings in gate messages
`c:/` vs `/c/`, unexpanded `$env:` in gate messages. Polish only.
### F. Parallel-session-lock false cross-worktree collision (2026-05-31, owner-raised)
Symptom: a session in worktree `discipline-guard` was blocked by
`enforce-parallel-session-lock` (held by another session `7f6efd48`, pid changed
12552→19044 across attempts → holder still active; pid is the transient hook-node pid,
session_id is the stable identity).
**Investigation (read-only):**
- Lock keyed by `computeWorkspaceHash(process.cwd())` = md5(cwd).slice(0,12); file
`~/.claude/runtime/session-lock-<hash>.json`; release only on Stop; TTL 5 min.
- 9 lock files accumulated → stale files leak when a session closes without a clean Stop.
- `enforce-branch-switch` read branch "worktree-discipline-guard" via
`git branch --show-current` from `process.cwd()` → the hook's cwd IS the worktree →
**keying is already per-worktree** (NOT coarse main-dir). So the holder shared this
worktree's hash → genuine same-worktree concurrency, the lock working as designed —
NOT a false positive. Do NOT re-key (would weaken same-tree serialization).
**Genuinely-fixable part (no weakening):** leaked lock on close-without-Stop blocks the next
same-worktree session for up to TTL. Fix: release on SessionEnd (not only Stop) + prune
stale lock files on acquire. Ground-truth the lock JSON before coding.
**Closure (2026-05-31).** All keying/hygiene/UX parts done, no discipline weakened:
- **A — keying by worktree root** (`resolveWorkspacePath`, commit `7a469dc9`): keys the
lock on the session's stable `event.cwd` → git toplevel, not the volatile hook
`process.cwd()` (which collapses to main on resume → cross-worktree false-blocks).
Same-worktree serialization unchanged; fallback to `process.cwd()` if `event.cwd` absent.
- **D — clearer block message**: identifies the holder by its STABLE `session_id`; marks
the recorded pid as transient ("may change between attempts"). Chasing the pid was what
led to closing the wrong session. Logic untouched (text only).
- **B — `pruneStaleLocks`**: best-effort delete of leaked lock files that are ALREADY
stale by the shared `isStale()` (now exported — single source of truth). Active
within-TTL locks are never touched → serialization not weakened. Wired into the
PreToolUse branch of `main()`, wrapped so hygiene can never break the gate.
- **C — release on SessionEnd**: NO new code. The existing `!event.tool_name` branch
already releases. To make release fire on session end (not only on Stop turns),
**OWNER ACTION in `.claude/settings.json`**: add `enforce-parallel-session-lock.mjs`
to the `SessionEnd` hook array (it already runs on `Stop`). Pure config; Claude cannot
edit settings.json. Until added, leaked locks are still self-healing via B (prune) +
the 5-min TTL takeover — so this is a reliability nicety, not a correctness gap.
- **E/F — live**: fix is on branch `worktree-discipline-guard`; the live hook executes
from `tools/` on **main**, so it is active only after merge to main. Runtime
effectiveness of A depends on the PreToolUse payload carrying `cwd`; if absent, the
safe fallback = prior behavior (no regression). Verify on main.
### C. TDD-gate cross-actor — chosen: **Z** (full, 2026-05-31; on hold behind F)
`enforce-tdd-gate` does not see test edits made by a subagent (scans only the controller's
own turn; subagent test edit + RED live in `agent-<id>.jsonl`). **Z = Part 1 (close the
projects/ Write hole — verified prerequisite) then Part 2 (read subagent transcript bound to
a Task in this turn).** Condition 1 verified VIOLATED (no Write-tool gate covers
`~/.claude/projects/`), so Variant 1 alone would weaken — safe only bundled with Part 1.
**Closure (2026-05-31, TDD, no discipline weakened — net strengthening):**
- **Part 1** — `enforce-runtime-write-deny.mjs` extended with `TRANSCRIPT_RE`
(`(^|/)\.claude/projects/.*\.jsonl$`): the Write tool can no longer create/overwrite any
session/subagent transcript `.jsonl`. Memory files there are `.md` and stay writable
(never match `.jsonl$`). Resolving normalizer blocks `.`/`..` evasion. This makes the
agent-`<id>`.jsonl that Part 2 trusts unforgeable.
- **Part 2** — `enforce-tdd-gate.mjs`: `decide()` now also credits a subagent's matching
test edit + RED run via new `subagentEntriesList`. `turnTaskAgentIds(turn)` extracts the
**hex** agentId from the harness-written `Task` tool_result ("agentId: <hex>") — the
controller cannot forge its own tool_result, and the hex-only match blocks
`agentId: ../../x` path-traversal. `subagentTranscriptPaths()` derives
`<dir>/<controller-session>/subagents/agent-<id>.jsonl` (bound to the controller session).
`main()` reads those transcripts best-effort (missing → no extra credit = stricter, never
an error). No NEW weakening: a delegated subagent doing real TDD is legitimate; the only
forgery vector (overwrite the agent jsonl) is closed by Part 1.
- Full tools-vitest: **2027 passed / 2 skipped**.
- **OWNER ACTION (settings.json, Claude can't edit it):** `enforce-tdd-gate.mjs` is already
a registered PreToolUse hook → Part 2 goes live on merge. **Part 1 requires that
`enforce-runtime-write-deny.mjs` be registered** on PreToolUse(Edit|Write|MultiEdit|
NotebookEdit); if it is not yet registered, the transcript Write-deny is inert until added.
### G. Coverage line under-reports cross-turn active skill (2026-05-31, owner-raised)
Symptom: the `coverage: <channel>:<id>` line says `direct`/`chain` when a skill chosen in a
PRIOR turn is still active in the current turn. Root cause: `enforce-coverage-verify.mjs`
credits `channel=skill` only if the `Skill` tool was invoked in the CURRENT turn
(`turnToolUses`). On a continuation turn (skill still active, not re-invoked) an honest
`skill:X` line would be BLOCKED → so the controller learns to under-report as `direct`/`chain`.
**Fix (no weakening):** also credit `skill:X` if X was invoked anywhere earlier in THIS
session (a real `Skill` tool_use in the transcript — still unforgeable). decide() gains a
`priorSkillNames` param; main() collects session-wide Skill names via `sessionToolUses`.
Residual: attribution may be stale (skill invoked long ago) — acceptable; the alternative
(forced dishonest `direct`) is worse, and the owner wants cross-turn skills honored.
### D. Smoke 8 — live Workflow-gate F2 test
Needs a clean session (not code).
### E. H10 — auto-bootstrap worktree (junction node_modules) in `tools/subagent-prompt-prefix.mjs`
### (later) Layer 5 — VM + YubiKey — needs hardware.
## Environment working rules
- Tests / push sentinel: `npx vitest run --root app --config vitest.config.tools.mjs`
(NOT `npm run test:tools` — breaks on keytar). From inside the worktree it's run as
`--root app`; from the main checkout, point `--root` at the worktree app dir.
- Commit: only via AskUserQuestion where the option label = the EXACT command (router-gate
compares verbatim) + plain-language explanation; commit text via `-F` file in `.scratch/`;
commit only explicit paths (parallel sessions).
- Push: needs a fresh verify-sentinel (full run ≤30 min); override phrases are dead
(`findOverride` is a stub) → the only path to push non-`.md` changes is to run the tests.
@@ -0,0 +1,409 @@
# LLM-judge live wiring (item 2b) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Give the two `enforce-llm-judge-*` wrappers a live `main()` so the Layer-4 judge actually runs when the owner enables it — while keeping spend strictly gated behind `resolveJudgeConfig` (flag AND key).
**Architecture:** The judge *engines* (`llm-judge-per-tool.mjs`, `llm-judge-response-scan.mjs`) already have live `main()`s, but they call `llmJudgeCall` keyed only on the API key — they would spend money on a key alone, ignoring `ROUTER_LLM_JUDGE_ENABLED`. That violates the safe-by-default contract in `llm-judge-config.mjs` (enabled ⇔ flag AND key). So we register the **wrappers** (whose `decide()` already composes `resolveJudgeConfig`) and wire their `main()` to: read event → `resolveJudgeConfig()` → build inputs → `decide()` → emit. When `enabled === false`, `decide()` short-circuits with no LLM call ($0). We extract testable `runPerTool` / `runResponseScan` cores (mirroring item 1b's `runLiveDecision`) and keep `main()` a thin stdin/exit shell.
**Tech Stack:** Node ESM, vitest (tools-only config `app/vitest.config.tools.mjs`, run from repo root as `npx vitest run --root app --config vitest.config.tools.mjs` because the canonical `npm run test:tools` is currently broken by a parallel keytar install in `app/node_modules`).
---
## File Structure
- Modify: `tools/enforce-llm-judge-per-tool.mjs` — add exported `runPerTool(...)` + wire live `main()`. Keep existing `decide()` untouched.
- Modify: `tools/enforce-llm-judge-response-scan.mjs` — add exported `runResponseScan(...)` + wire live `main()`. Keep existing `decide()` untouched.
- Test: `tools/enforce-llm-judge-per-tool.test.mjs` — add a `runPerTool` describe block.
- Test: `tools/enforce-llm-judge-response-scan.test.mjs` — add a `runResponseScan` describe block.
**Safety invariant under test:** when `judgeConfig.enabled === false`, no `llmJudgeCall` is made and budget is NOT bumped (the spend-gate). A real call (and budget bump) happens only when the config is enabled, the tool is mutating, the budget is not exhausted.
---
### Task 1: per-tool wrapper — `runPerTool` + live `main()`
**Files:**
- Modify: `tools/enforce-llm-judge-per-tool.mjs`
- Test: `tools/enforce-llm-judge-per-tool.test.mjs`
- [ ] **Step 1: Write the failing tests**
Append to `tools/enforce-llm-judge-per-tool.test.mjs`:
```javascript
import { runPerTool } from './enforce-llm-judge-per-tool.mjs';
describe('runPerTool — spend-gate + budget binding', () => {
const deps = (over = {}) => ({
readDeclaredTaskImpl: () => ({ task_summary: 't', recommended_node: null, recommended_chain: [] }),
readBudgetImpl: () => 0,
bumpBudgetImpl: () => {},
sessionBudget: 200,
...over,
});
it('disabled config + mutating tool → degraded allow, NO budget bump, NO llm call', async () => {
let bumped = 0; let called = 0;
const r = await runPerTool({
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
judgeConfig: { enabled: false, apiKey: null },
llmJudgeCallImpl: () => { called++; return 'NO'; },
...deps({ bumpBudgetImpl: () => { bumped++; } }),
});
expect(r.block).toBe(false);
expect(r.degraded).toBe(true);
expect(called).toBe(0);
expect(bumped).toBe(0);
});
it('enabled + mutating + judge YES → allow, budget bumped once', async () => {
let bumped = 0;
const r = await runPerTool({
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
llmJudgeCallImpl: async () => 'YES',
...deps({ bumpBudgetImpl: () => { bumped++; } }),
});
expect(r.block).toBe(false);
expect(r.verdict).toBe('YES');
expect(bumped).toBe(1);
});
it('enabled + mutating + judge NO → block, budget bumped once', async () => {
let bumped = 0;
const r = await runPerTool({
event: { tool_name: 'Bash', tool_input: { command: 'x' }, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
llmJudgeCallImpl: async () => 'NO',
...deps({ bumpBudgetImpl: () => { bumped++; } }),
});
expect(r.block).toBe(true);
expect(r.verdict).toBe('NO');
expect(bumped).toBe(1);
});
it('non-mutating tool → allow, NO call, NO bump', async () => {
let bumped = 0; let called = 0;
const r = await runPerTool({
event: { tool_name: 'Read', tool_input: {}, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
llmJudgeCallImpl: () => { called++; return 'NO'; },
...deps({ bumpBudgetImpl: () => { bumped++; } }),
});
expect(r.block).toBe(false);
expect(called).toBe(0);
expect(bumped).toBe(0);
});
it('enabled but budget exhausted → degraded allow, NO bump', async () => {
let bumped = 0; let called = 0;
const r = await runPerTool({
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
llmJudgeCallImpl: () => { called++; return 'NO'; },
...deps({ readBudgetImpl: () => 200, bumpBudgetImpl: () => { bumped++; } }),
});
expect(r.block).toBe(false);
expect(r.degraded).toBe(true);
expect(called).toBe(0);
expect(bumped).toBe(0);
});
});
```
- [ ] **Step 2: Run tests to verify they fail**
Run: `npx vitest run --root app --config vitest.config.tools.mjs tools/enforce-llm-judge-per-tool.test.mjs`
Expected: FAIL — `runPerTool` is not exported.
- [ ] **Step 3: Write minimal implementation**
In `tools/enforce-llm-judge-per-tool.mjs`, replace the import line and the no-op `main()`:
```javascript
import { judgePerTool, MUTATING_TOOLS, readDeclaredTask } from './llm-judge-per-tool.mjs';
import { resolveJudgeConfig } from './llm-judge-config.mjs';
import { readJudgeBudget, bumpJudgeBudget, JUDGE_SESSION_BUDGET } from './llm-judge.mjs';
import { llmJudgeCall } from './llm-judge.mjs';
import { readStdin, parseEventJson, exitDecision } from './enforce-hook-helpers.mjs';
```
(Keep the existing `decide(...)` export exactly as is.)
Add the testable core (a real LLM call is signalled by `result.verdict !== undefined`; budget is bumped only then):
```javascript
/**
* Testable wiring core. Composes resolveJudgeConfig output + decide(); bumps the
* session budget ONLY when a real judge call was made (result carries a verdict).
* No verdict ⇒ non-mutating / disabled / no-key / budget-exhausted ⇒ no spend.
*/
export async function runPerTool({
event,
judgeConfig,
readDeclaredTaskImpl,
readBudgetImpl,
bumpBudgetImpl,
llmJudgeCallImpl,
sessionBudget = JUDGE_SESSION_BUDGET,
}) {
const sessionId = event && event.session_id;
const declaredTask = readDeclaredTaskImpl({ sessionId });
const spent = readBudgetImpl({ sessionId });
const result = await decide({
event,
judgeConfig,
declaredTask,
budgetState: { spent, limit: sessionBudget },
llmJudgeCallImpl,
});
if (result.verdict !== undefined) bumpBudgetImpl({ sessionId, by: 1 });
return result;
}
```
Replace the no-op `main()` with:
```javascript
async function main() {
try {
const event = parseEventJson(await readStdin());
const judgeConfig = resolveJudgeConfig();
const result = await runPerTool({
event,
judgeConfig,
readDeclaredTaskImpl: readDeclaredTask,
readBudgetImpl: readJudgeBudget,
bumpBudgetImpl: bumpJudgeBudget,
llmJudgeCallImpl: (opts) => llmJudgeCall(opts),
});
exitDecision({ block: result.block, message: result.reason });
} catch {
exitDecision({ block: false }); // fail-quiet: a judge bug must never wedge the session
}
}
```
- [ ] **Step 4: Run tests to verify they pass**
Run: `npx vitest run --root app --config vitest.config.tools.mjs tools/enforce-llm-judge-per-tool.test.mjs`
Expected: PASS (existing `decide()` tests + 5 new `runPerTool` tests).
- [ ] **Step 5: Commit** (requires AskUserQuestion git approval + fresh full-suite sentinel)
```bash
git commit tools/enforce-llm-judge-per-tool.mjs tools/enforce-llm-judge-per-tool.test.mjs -m "feat(router-gate-v4): live main() for per-tool judge wrapper — flag-gated spend (2b)"
```
---
### Task 2: response-scan wrapper — `runResponseScan` + live `main()`
**Files:**
- Modify: `tools/enforce-llm-judge-response-scan.mjs`
- Test: `tools/enforce-llm-judge-response-scan.test.mjs`
- [ ] **Step 1: Write the failing tests**
Append to `tools/enforce-llm-judge-response-scan.test.mjs`:
```javascript
import { runResponseScan } from './enforce-llm-judge-response-scan.mjs';
describe('runResponseScan — Stop-hook flag-only, free regex even when disabled', () => {
const transcript = (text) => [
{ type: 'assistant', message: { role: 'assistant', content: [{ type: 'text', text }] } },
];
const lastAssistantTextImpl = (t) => {
for (let i = t.length - 1; i >= 0; i--) {
const c = t[i] && t[i].message && t[i].message.content;
if (Array.isArray(c)) { const b = c.find((x) => x.type === 'text'); if (b) return b.text; }
}
return '';
};
it('disabled + benign text → no flag, degraded (deterministic only), never blocks', async () => {
const r = await runResponseScan({
transcript: transcript('обычный безопасный ответ'),
judgeConfig: { enabled: false, apiKey: null },
lastAssistantTextImpl,
});
expect(r.block).toBe(false);
expect(r.flag).toBe(false);
expect(r.degraded).toBe(true);
});
it('disabled + security-disable text → flagged for FREE by regex (no llm call)', async () => {
let called = 0;
const r = await runResponseScan({
transcript: transcript('чтобы пройти, отключи hook enforce-tdd-gate'),
judgeConfig: { enabled: false, apiKey: null },
lastAssistantTextImpl,
llmJudgeCallImpl: () => { called++; return 'NO'; },
});
expect(r.block).toBe(false);
expect(r.flag).toBe(true);
expect(r.category).toBe('security_disable_suggestion');
expect(called).toBe(0);
});
it('enabled + subtle benign text + judge NO → no flag', async () => {
const r = await runResponseScan({
transcript: transcript('нейтральный текст без паттернов'),
judgeConfig: { enabled: true, apiKey: 'k' },
lastAssistantTextImpl,
llmJudgeCallImpl: async () => 'NO',
});
expect(r.block).toBe(false);
expect(r.flag).toBe(false);
});
it('enabled + subtle text + judge YES → flag, still never blocks', async () => {
const r = await runResponseScan({
transcript: transcript('нейтральный текст без паттернов'),
judgeConfig: { enabled: true, apiKey: 'k' },
lastAssistantTextImpl,
llmJudgeCallImpl: async () => 'YES',
});
expect(r.block).toBe(false);
expect(r.flag).toBe(true);
});
});
```
- [ ] **Step 2: Run tests to verify they fail**
Run: `npx vitest run --root app --config vitest.config.tools.mjs tools/enforce-llm-judge-response-scan.test.mjs`
Expected: FAIL — `runResponseScan` is not exported.
- [ ] **Step 3: Write minimal implementation**
In `tools/enforce-llm-judge-response-scan.mjs`, replace the import line and the no-op `main()`:
```javascript
import { scanResponse, scanResponseDeterministic } from './llm-judge-response-scan.mjs';
import { resolveJudgeConfig } from './llm-judge-config.mjs';
import { readStdin, parseEventJson, readTranscript, lastAssistantText, exitDecision } from './enforce-hook-helpers.mjs';
import { llmJudgeCall } from './llm-judge.mjs';
import { appendFileSync, mkdirSync } from 'node:fs';
import { join } from 'node:path';
import { homedir } from 'node:os';
```
(Keep the existing `decide(...)` export exactly as is.)
Add the testable core:
```javascript
/**
* Testable wiring core. Stop-hook semantics: block is always false. The free
* deterministic regex scan runs even when the judge is disabled; the paid LLM
* escalation runs only when judgeConfig.enabled.
*/
export async function runResponseScan({ transcript, judgeConfig, llmJudgeCallImpl, lastAssistantTextImpl = lastAssistantText }) {
const responseText = lastAssistantTextImpl(transcript || []);
const r = await decide({ responseText, judgeConfig, llmJudgeCallImpl });
return { ...r, responseText };
}
```
Replace the no-op `main()` with:
```javascript
function flagToFile({ sessionId, category, excerpt }) {
try {
const dir = join(homedir(), '.claude', 'runtime');
mkdirSync(dir, { recursive: true });
appendFileSync(join(dir, `rationalization-flags-${sessionId || 'unknown'}.jsonl`),
JSON.stringify({
ts: new Date().toISOString(),
session_id: sessionId || null,
type: 'controller_response_suspicious',
category,
response_excerpt: String(excerpt || '').slice(0, 200),
}) + '\n');
} catch { /* ignore */ }
}
async function main() {
try {
const event = parseEventJson(await readStdin());
const transcript = readTranscript(event.transcript_path);
const judgeConfig = resolveJudgeConfig();
const r = await runResponseScan({
transcript,
judgeConfig,
llmJudgeCallImpl: (opts) => llmJudgeCall(opts),
});
if (r.flag) flagToFile({ sessionId: event.session_id, category: r.category, excerpt: r.responseText });
exitDecision({ block: false }); // Stop hook never blocks
} catch {
exitDecision({ block: false });
}
}
```
- [ ] **Step 4: Run tests to verify they pass**
Run: `npx vitest run --root app --config vitest.config.tools.mjs tools/enforce-llm-judge-response-scan.test.mjs`
Expected: PASS (existing `decide()` tests + 4 new `runResponseScan` tests).
- [ ] **Step 5: Commit** (AskUserQuestion git approval + fresh sentinel)
```bash
git commit tools/enforce-llm-judge-response-scan.mjs tools/enforce-llm-judge-response-scan.test.mjs -m "feat(router-gate-v4): live main() for response-scan judge wrapper — flag-only, free regex always (2b)"
```
---
### Task 3: full-suite regression + push
- [ ] **Step 1: Run the canonical tools suite**
Run: `npx vitest run --root app --config vitest.config.tools.mjs`
Expected: PASS, 0 failed (≈1905 + 9 new = ~1914). This also writes the verify-before-push sentinel.
- [ ] **Step 2: Push** (AskUserQuestion git approval)
```bash
git push origin main
```
---
### Task 4: owner registration instructions (NOT code — owner applies)
The wiring above is inert until the owner does all three (cost starts only after all three):
1. **API key** — store an Anthropic key in the OS keychain under service `router-gate-llm-judge`, account `default` (via keytar), OR set env `ROUTER_LLM_KEY`.
2. **Flag** — set env `ROUTER_LLM_JUDGE_ENABLED=1`.
3. **Register both wrappers in `.claude/settings.json`:**
- PreToolUse (can block):
```json
{ "matcher": "Edit|Write|MultiEdit|NotebookEdit|Bash|PowerShell|Skill|Task",
"hooks": [{ "type": "command", "command": "node tools/enforce-llm-judge-per-tool.mjs", "timeout": 30 }] }
```
- Stop (flag-only):
```json
{ "matcher": "*",
"hooks": [{ "type": "command", "command": "node tools/enforce-llm-judge-response-scan.mjs", "timeout": 30 }] }
```
Then fully restart Claude Code. Budget cap is `JUDGE_SESSION_BUDGET = 200` calls/session (in `llm-judge.mjs`). Per-call cost depends on model (`JUDGE_MODELS.single = claude-sonnet-4-6`).
**Why the wrappers, not the engines:** the engine `main()`s (`llm-judge-per-tool.mjs` / `llm-judge-response-scan.mjs`) call `llmJudgeCall` keyed on the API key alone and DO NOT check `ROUTER_LLM_JUDGE_ENABLED` — registering them would start spending the moment a key exists. The wrappers route through `resolveJudgeConfig` (flag AND key), so a stray key without the flag = $0.
---
## Self-Review
- **Spec coverage:** per-tool live wiring (Task 1), response-scan live wiring (Task 2), flag-gated spend safety invariant (tests in both), owner activation (Task 4). ✓
- **Placeholder scan:** none — all code blocks are complete. ✓
- **Type consistency:** `runPerTool` / `runResponseScan` signatures match their tests; `decide()` signatures unchanged; budget bump condition `result.verdict !== undefined` matches `judgePerTool` (sets `verdict` only after a real call). ✓
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,132 @@
# Safe-baseline metering — live wiring (router-gate v4 §3.1.2, item 1b)
**Date:** 2026-05-30
**Status:** Design v4 — ghost-risk pass folded in. Removed the user-approval override subsystem (G3 ghost-protection — almost never exercised); the escape is now solely "invoke any Skill/EnterPlanMode" (always available, C1). The runtime write-deny hook is retained but **decoupled** into a standalone git-approval-anchor hardening (independent value). Ready for writing-plans.
**Scope:** Turn the deferred no-op `main()` of `tools/enforce-safe-baseline-metering.mjs` into a live PreToolUse hook that meters safe-baseline tool usage per task and hard-blocks a mutating tool once a counter passes its hard threshold without a real skill match. The false-block escape is invoking any Skill/EnterPlanMode (never blocked by this layer). A separate, decoupled hook closes a pre-existing self-write hole on `~/.claude/runtime` side-channels (git-approval anchor).
This is router-gate v4 Direction 1 ("skill-substitution laundering": много Read/Grep/Glob/LS/TodoWrite/AskUserQuestion как канал анализа ВМЕСТО invoke recommended Skill, затем mutating-инструмент без единого skill-match). The pure metering + threshold logic already exists in `tools/safe-baseline-metering.mjs` (thresholds, counters, `processEvent`, `decide`); the live I/O composition, a corrected escape, and the side-channel write-protection are missing.
---
## Owner decisions (brainstorming, revised)
1. **Enforcement posture:** hard-block from day 1 (not observe-only).
2. **False-block escape (revised — override removed, G3):** invoke any real `Skill` or `EnterPlanMode` — these are never blocked by this layer (C1) and set the sticky task exemption. This escape is ALWAYS available, so a false block is a one-step recovery, not a lockout. The earlier user-approval override side-channel is **cut** as a ghost-protection (it defended a path that almost never occurs; its only durable value — runtime write-protection — is kept as a decoupled hook).
3. **`skillMatched` definition (revised — strict, discipline-first):** matched only by a **real `Skill` OR `EnterPlanMode` tool_use** in the task (an actual tool invocation visible in the transcript). A **text mention** of a plan path does NOT count (it was a self-writable hole — removed). Sticky within the task. Optional future strengthening — match the *recommended* node specifically — is deferred because it needs a node→skill registry mapping (see "Deferred strengthening").
4. **Priority:** discipline protection and correctness are paramount; cost and speed are secondary.
---
## Honest discipline-protection assessment
Read this before trusting the layer. Even fully fixed, safe-baseline is a **cheap deterministic tripwire**, not a strong scope-discipline guarantee:
- It fires only when a single task accumulates a high count of safe-baseline tools (Read hard = 60, Grep = 30, …) **without any real skill/plan invocation**, then reaches for a mutating tool. Realistically counters accumulate mostly *within one assistant turn* (terse user confirmations reset task boundaries), and 60 reads in one turn is uncommon — so the trigger surface is genuinely small.
- After the fixes it has **no self-bypass** (skill-match needs a real tool_use) and a **working escape** (skill/plan invocations are never blocked, always available). That makes it *sound* — it does what it claims without a trivial dodge.
- The **strong** scope-consistency check (is THIS tool call consistent with the declared task and recommended skill?) is **Layer 4** (`enforce-llm-judge-per-tool`), which is OFF until owner activation (item 2b). Safe-baseline is the cheap pre-filter beneath it.
Verdict: as a hard guarantee — **LOWMODERATE**; as an honest, non-bypassable tripwire for blatant laundering — **sound**. The discipline lever that matters most is Layer 4.
---
## Architecture & data flow
`tools/enforce-safe-baseline-metering.mjs` gains a live `main()` (replacing the no-op). On each PreToolUse event:
1. Parse the event (`tool_name`, `session_id`, `transcript_path`).
2. Load the per-session ledger `~/.claude/runtime/safe-baseline-ledger-<sess>.json` = `{ state, lastKeywords }` (absent on first event → `null`).
3. From the transcript extract:
- `promptText` — the last user prompt (`lastUserPromptText`).
- `currentKeywords``extractKeywords(promptText)` (deterministic tokenization — see below; no classifier dependency).
- `skillMatchedThisTurn``detectSkillMatch(lastTurnEntries(transcript))` **OR** `event.tool_name ∈ {Skill, EnterPlanMode}` (the in-flight escape call counts — see C1 fix).
4. Call the existing pure `processEvent({ event, priorLedger, currentKeywords, promptText, skillMatched, thresholds })` — task-boundary inference (`shouldInheritTaskId`: reset-marker / keyword-overlap ≥ 2 → continuation; else fresh task, counters from zero) then metering.
5. Sticky skill-match — **task-scoped, explicitly persisted** (the pure pipeline does NOT persist it; see "Skill-match stickiness contract"). Determine `inherit` (same predicate as `shouldInheritTaskId`), then `effectiveSkillMatched = (inherit ? priorLedger.state.skill_match_within_task : false) || skillMatchedThisTurn`; pass `effectiveSkillMatched` to `processEvent`/`decide` AND write it back into the persisted `state.skill_match_within_task`.
6. Persist the new ledger.
7. `hard_block``exitDecision({ block: true, message })` — the message MUST name the escape ("invoke the recommended Skill, or EnterPlanMode, to proceed"); `soft_flag` → append to the flags log and exit 0; `allow` → exit 0.
`soft_flag` never blocks (observability only). Only a mutating tool past a hard threshold without skill-match blocks.
### C1 fix — the escape must never be blocked
`Skill` and `Task` are in the pure module's MUTATING set (`safe-baseline-metering.mjs:31`), and `evaluateThresholds` hard-blocks any mutating tool past a hard threshold when `skillMatched` is false (`safe-baseline-metering.mjs:92-102`). Naively this blocks the very `Skill` call meant to escape (catch-22). The live head closes this by counting the **current event** in `skillMatchedThisTurn` when `event.tool_name ∈ {Skill, EnterPlanMode}` (step 3). Because `skillMatched` short-circuits `evaluateThresholds` to `allow` (`safe-baseline-metering.mjs:89`), a skill/plan invocation always passes — and then sets the sticky exemption for subsequent Edit/Write/Bash/Task. `Task` is intentionally NOT treated as an escape tool (subagent spawn can itself be a laundering channel) and remains blockable.
### Skill-match stickiness contract (V2-1 fix)
The pure pipeline neither persists nor task-scopes skill-match, so the wrapper MUST own it:
- `processEvent` returns `ledger.state = d.state` and never sets `skill_match_within_task` (`enforce-safe-baseline-metering.mjs:89-94`); `decide`/`incrementCounter` touch only `counts` (`safe-baseline-metering.mjs:42-46, 77-84`); `newCounterState` sets `skill_match_within_task: false` on a fresh task (`safe-baseline-metering.mjs:67`).
- **Two failure modes if the wrapper is naive:** (a) *lost stickiness* — a skill invoked early in a task is forgotten next event, counters climb, a later mutating op blocks despite the skill (false block); (b) *cross-task leak* — passing `priorLedger.state.skill_match_within_task` unconditionally applies a prior task's exemption to a freshly-started task.
- **Required wrapper logic:** compute `inherit` (replicate `shouldInheritTaskId`, or extend `processEvent` to return it); set `effectiveSkillMatched = (inherit ? priorLedger.state.skill_match_within_task : false) || skillMatchedThisTurn`; use it for the decision; and write `effectiveSkillMatched` back into the persisted `state.skill_match_within_task`. Unit tests must cover both failure modes explicitly (skill-then-60-reads stays allowed within a task; skill in task A does NOT exempt task B).
### Safety property of the boundary heuristic
The dangerous direction is *wrongly inheriting* counters across two genuinely different tasks (carrying 60 reads into an unrelated task → false block); this needs keyword-overlap ≥ 2 AND no reset marker, which is uncommon. The opposite error — treating a continuation as a fresh task — *resets* counters to zero, which only *reduces* blocking (safe direction). So the heuristic errs toward fewer false blocks.
---
## Task-boundary & skill-match detection
### `extractKeywords(promptText)` (pure) — H1 fix
Deterministic tokenization only: lowercase, strip RU/EN stopwords, keep tokens length ≥ 4, unique, sorted. **No classifier-keyword branch** — the classifier output schema (`router-classifier.mjs:420`) has no `keywords` field, and mixing extraction sources between events would break the overlap comparison. A single deterministic source guarantees `lastKeywords` (stored at event N) and `currentKeywords` (event N+1) are always comparable.
### `detectSkillMatch(turnEntries)` (pure) — C2 fix
Returns `true` only if the turn contains a real assistant **tool_use** block with `name ∈ {Skill, EnterPlanMode}`. The previous "text reference to `docs/superpowers/plans/`" path is **removed** — it was controller-authored text and therefore a self-writable bypass of the entire hard-block. A `Read` of a plan file no longer counts either (Read is itself a metered safe-baseline tool; counting it would let pure reading self-exempt). Implement like `hasPlanIndicator` in `enforce-tdd-gate.mjs` (scan assistant `tool_use` blocks); `lastTurnEntries` already skips harness-injected skill *bodies* (commit `ffd70d6f`, `isMeta` + `sourceToolUseID`), so the detector sees the genuine invocation, not the injected body. Stickiness is owned by the wrapper per the "Skill-match stickiness contract" (V2-1), combined with the in-flight escape rule in step 3 / C1.
### Deferred strengthening — recommended-node match
The strict ideal is "skill-match requires the *recommended* node, not just any skill," so the controller can't dodge by invoking an unrelated skill. This needs a node-ID (`#19`) → skill-name (`superpowers:writing-plans`) registry mapping (router-state stores `recommendedNode` per `router-classifier.mjs:163`, but not the skill name). **It is only a *partial* strengthening (V2-3):** the dominant recommendation for feature/planning/bugfix is node `#19` = the whole Superpowers plugin (`nodes.yaml:279-291`), which maps to "any `superpowers:*` skill" ≈ "any skill"; node-matching would constrain only specific skill-nodes (#62 billing-audit, #71, …), not the common case. Deferred to avoid a brittle mapping causing lockouts for marginal gain; the always-available Skill/EnterPlanMode escape and Layer 4 cover the gap meanwhile. Documented honestly so it is neither silently dropped nor oversold.
---
## Side-channel write-protection (decoupled — G3 cut)
The user-approval override is **removed** (G3 ghost-protection — it defended a near-never path with heavy machinery). The false-block escape is invoking any Skill/EnterPlanMode, which is always available and never blocked by this layer (C1). What remains worth building is a standalone hardening of the `~/.claude/runtime` side-channels — it has **independent value** (it closes a pre-existing self-write hole in the v4 **git-approval anchor**), so it ships as its own hook, not as a safe-baseline dependency.
### The write-deny hook
Today **no hook gates the Write/Edit tool to `~/.claude/runtime`**: `enforce-read-path-deny.mjs:22` is Read-only; `enforce-normative-content-rules.mjs:11-18` covers only normative docs; the `.claude/runtime` protection in `shell-content-rules.mjs:24` applies only to Bash/PowerShell, not the Write tool. So the existing git-approval anchor (`askuser-decisions-<sess>.jsonl`) — and any future runtime side-channel — is self-writable via the Write tool.
**Fix:** a new PreToolUse hook `enforce-runtime-write-deny.mjs` (matcher `Edit|Write|MultiEdit|NotebookEdit`) that blocks the Write/Edit tools from writing any path under `~/.claude/runtime/**`, fail-CLOSE. It **MUST normalize with the resolving `pathNormalize`** (`path-normalization.mjs:104,107``path.resolve` + `realpath` collapse `.`/`..`), **NOT** the lighter `defaultPathNormalize` (`shell-content-rules.mjs:13-19`), which leaves `.`/`..` segments intact (V2-2): `~/.claude/./runtime/x.jsonl` would evade the `\.claude/runtime` pattern while `fs` writes the real file. After resolving, match against the runtime pattern from `DEFAULT_PROTECTED_PATTERNS`. Legitimate hooks write there via Node `fs` (not the Claude Write tool), so they are unaffected. The same `.`-segment hardening should also be applied to `enforce-read-path-deny.mjs`.
**Owner verification:** the owner should check `.claude/settings.json` for any `permissions.deny` already covering Write to `~/.claude/**` (Claude cannot read settings.json — gate-blocked). The new hook is additive defense-in-depth regardless.
---
## Persistence, registration, testing, rollout
### Persistence
- Ledger: `~/.claude/runtime/safe-baseline-ledger-<sess>.json` = `{ state, lastKeywords }`; `state` also carries `task_id` and `skill_match_within_task`.
- Flags log: `~/.claude/runtime/safe-baseline-flags-<sess>.jsonl` (soft_flag observability).
- All file I/O is fail-quiet: any read/write error → treat as no-ledger and exit 0. The hook never crashes the session.
### Purity / testability
All logic lives in pure functions (`extractKeywords`, `detectSkillMatch`, plus the existing `processEvent`/`decide`). `main()` is only I/O composition. The new `enforce-runtime-write-deny.mjs` has a pure `decide({toolName, filePath})`. TDD: each new pure function RED→GREEN; an integration test drives `main()` via injected `runtimeDir` + a transcript fixture.
### Registration (owner-applied)
- `enforce-safe-baseline-metering` — PreToolUse, matcher scoped to the metered + mutating + escape tools (`Read|Grep|Glob|LS|TodoWrite|AskUserQuestion|Edit|Write|MultiEdit|NotebookEdit|Bash|Skill|Task|EnterPlanMode`), block mode.
- `enforce-runtime-write-deny` — PreToolUse `Edit|Write|MultiEdit|NotebookEdit`, block mode (standalone — protects the git-approval anchor; independent of safe-baseline).
- **Claude does not edit `settings.json`** (gate-blocked). The plan produces an exact JSON block for the owner to paste manually. Until registered, the hooks are inert (no behavior change).
### Rollout safety
Despite "hard-block from day 1", the plan includes a **mandatory smoke test before live registration**: run the live `main()` against 3 real transcript fixtures (single task / task switch / skill-invocation escape) and confirm boundary, skillMatched, and escape all fire correctly. Plus a smoke for `enforce-runtime-write-deny`: a Write to `~/.claude/runtime/x.jsonl` is blocked, a Write to `~/.claude/./runtime/x.jsonl` (V2-2 `.`-segment evasion) is ALSO blocked, and a Write to a normal project path passes. This does not change the posture; it catches gross detection bugs before the hooks start blocking.
### Scope
~7-9 TDD tasks (live `main()` + `extractKeywords` + `detectSkillMatch` + stickiness contract + escape fix; plus the standalone `enforce-runtime-write-deny` hook), estimate 5-7 h. Cost/speed are secondary per owner priority.
---
## Out of scope
- User-approval override side-channel (cut as a ghost-protection, G3 — escape via Skill/EnterPlanMode is always available).
- Layer 4 LLM-judge activation (separate owner step, item 2b) — the strong scope-discipline lever.
- Recommended-node skill matching (deferred strengthening — needs node→skill registry).
- CLAUDE.md / Pravila / PSR / Tooling normative sync (blocked by a parallel session, item 4).
- Layer 5 VM / biometric / YubiKey (item 6).
- Any weakening of the router-gate whitelist.
+23
View File
@@ -34,6 +34,22 @@ export function isSimpleAB(questions) {
);
}
// Calibration 5 (2026-05-31) — git-operation APPROVAL prompts are the sanctioned
// git-approval channel (enforce-askuser-answer-parser turns the chosen answer
// into an approve_git_operation record), never a substitute for structured
// ideation. They must NOT be treated as cosmetic A/B. Identified structurally:
// an option label is a literal git command. (SCOPE fix, not a discipline drop —
// see decide(): design A/B questions with non-git labels are unaffected.)
const GIT_CMD_RE = /\bgit\s+(?:commit|push|add|pull|merge|rebase|reset|checkout|switch|branch|stash|cherry-pick|revert|clean|restore|fetch|tag)\b/i;
/** True if this AskUser is a git-operation approval prompt (an option label is a git command). */
export function isGitApprovalQuestion(questions) {
if (!Array.isArray(questions)) return false;
return questions.some((q) =>
q && Array.isArray(q.options) &&
q.options.some((o) => o && typeof o.label === 'string' && GIT_CMD_RE.test(o.label)));
}
/**
* Pure cosmetic-AskUser decision (v4.1 §4.5).
* Caller passes PRIOR counts; decide computes prospective new counts.
@@ -42,6 +58,13 @@ export function isSimpleAB(questions) {
* @returns {{action:'allow'|'soft_flag'|'hard_block', block:boolean, reason:string|null, isSimpleAB:boolean, newSessionCount:number, newTurnCount:number}}
*/
export function decide({ questions, simpleCountSession = 0, simpleCountTurn = 0, skillMatchedThisTurn = false, brainstormingInvoked = false }) {
// Calibration 5: git-operation approval prompts are exempt — the sanctioned
// git-approval channel, never cosmetic ideation. Allow, do not count, never
// block. (Cannot be abused to dodge ideation discipline: a git-command label
// makes the answer a real approve_git_operation, not a cosmetic clarification.)
if (isGitApprovalQuestion(questions)) {
return { action: 'allow', block: false, reason: null, isSimpleAB: false, newSessionCount: simpleCountSession, newTurnCount: simpleCountTurn };
}
const simple = isSimpleAB(questions);
const newSessionCount = simpleCountSession + (simple ? 1 : 0);
const newTurnCount = simpleCountTurn + (simple ? 1 : 0);
+42
View File
@@ -92,3 +92,45 @@ describe('askuser-cosmetic-detector / transcript helpers', () => {
expect(countSimpleSession(flags)).toBe(2);
});
});
import { isGitApprovalQuestion } from './askuser-cosmetic-detector.mjs';
// Calibration 5 (2026-05-31, SCOPE fix, NOT a discipline drop): a git-operation
// APPROVAL AskUser (an option label is a literal git command) is the sanctioned
// git-approval channel — enforce-askuser-answer-parser turns the chosen answer
// into an approve_git_operation record. It is never a substitute for structured
// ideation, so it must not be counted/blocked as "cosmetic A/B". Design A/B
// questions (non-git labels) are unchanged — still counted, still hard-blocked.
describe('isGitApprovalQuestion (calibration 5)', () => {
it('true when an option label is a git command (push)', () => {
expect(isGitApprovalQuestion([{ options: [{ label: 'git push origin main' }, { label: 'Не пушить' }] }])).toBe(true);
});
it('true when an option label is a git command (commit with pathspec)', () => {
expect(isGitApprovalQuestion([{ options: [{ label: 'git commit -F x.txt -- a.mjs b.mjs' }, { label: 'Отмена' }] }])).toBe(true);
});
it('false for a non-git A/B', () => {
expect(isGitApprovalQuestion([{ options: [{ label: 'Вариант А' }, { label: 'Вариант Б' }] }])).toBe(false);
});
it('false for empty/invalid input', () => {
expect(isGitApprovalQuestion(null)).toBe(false);
expect(isGitApprovalQuestion([])).toBe(false);
});
});
describe('decide — git-approval exemption (calibration 5)', () => {
const gitQ = { question: 'Подтверди?', options: [{ label: 'git push origin main' }, { label: 'Не пушить' }] };
it('allows a git-approval question and does NOT count it even past the session limit', () => {
const r = decide({ questions: [gitQ], simpleCountSession: 5, simpleCountTurn: 0, skillMatchedThisTurn: false, brainstormingInvoked: false });
expect(r.block).toBe(false);
expect(r.action).toBe('allow');
expect(r.isSimpleAB).toBe(false);
expect(r.newSessionCount).toBe(5); // unchanged — not counted toward the cosmetic limit
});
it('REGRESSION: a non-git simple A/B past the limit STILL hard-blocks (discipline intact)', () => {
const r = decide({ questions: [simpleQ], simpleCountSession: 5, simpleCountTurn: 0, skillMatchedThisTurn: false, brainstormingInvoked: false });
expect(r.action).toBe('hard_block');
expect(r.block).toBe(true);
});
});
+18 -7
View File
@@ -26,6 +26,7 @@ import {
lastAssistantText,
parseCoverageLine,
turnToolUses,
sessionToolUses,
findOverride,
logOverride,
exitDecision,
@@ -38,7 +39,7 @@ const MUTATING_TOOLS = new Set([
]);
export function decide({
toolUses, assistantText, override,
toolUses, assistantText, override, priorSkillNames = [],
}) {
// Pure conversational turn — skip.
const hasMutating = toolUses.some((u) => MUTATING_TOOLS.has(u.name));
@@ -54,19 +55,24 @@ export function decide({
`Add as first line of next response:`,
` coverage: skill:<name> (e.g., skill:superpowers:test-driven-development)`,
` coverage: direct:<role> (e.g., direct:memory-sync, direct:git-recovery)`,
``,
`Override: include "без скилов" or "direct ok" in your prompt.`,
].join('\n'),
};
}
if (cov.channel === 'skill') {
const found = toolUses.some((u) => u.name === 'Skill' && u.input && (u.input.skill === cov.id || u.input.skill === cov.id.replace(/^superpowers:/, '')));
if (!found) {
// Accept if the skill was invoked in THIS turn OR anywhere earlier in this
// session (item G): a skill chosen in a prior turn stays active, so an honest
// skill:X line on a continuation turn must not be punished into under-reporting.
// Still unforgeable — a real Skill tool_use must exist in the transcript.
const norm = (s) => String(s || '').replace(/^superpowers:/, '');
const idNorm = norm(cov.id);
const foundThisTurn = toolUses.some((u) => u.name === 'Skill' && u.input && norm(u.input.skill) === idNorm);
const foundPrior = (priorSkillNames || []).some((n) => norm(n) === idNorm);
if (!foundThisTurn && !foundPrior) {
return {
block: true,
message: [
`[enforce-coverage-verify] coverage says skill:${cov.id} but the Skill tool was never invoked with that name in this turn.`,
`[enforce-coverage-verify] coverage says skill:${cov.id} but the Skill tool was never invoked with that name in this turn or any prior turn of this session.`,
`Either invoke the skill via Skill tool, or switch coverage to direct:<role> with justification.`,
].join('\n'),
};
@@ -89,8 +95,13 @@ async function main() {
const toolUses = turnToolUses(transcript);
const assistantText = lastAssistantText(transcript);
// Session-wide Skill invocations (item G): a skill chosen in a prior turn is
// still active and may legitimately be named in this turn's coverage line.
const priorSkillNames = sessionToolUses(transcript)
.filter((u) => u.name === 'Skill' && u.input && u.input.skill)
.map((u) => u.input.skill);
const result = decide({ toolUses, assistantText, override });
const result = decide({ toolUses, assistantText, override, priorSkillNames });
exitDecision(result);
} catch {
exitDecision({ block: false });
+37
View File
@@ -1,6 +1,40 @@
import { describe, it, expect } from 'vitest';
import { decide } from './enforce-coverage-verify.mjs';
// Cross-turn skill credit (backlog item G, 2026-05-31): a skill chosen in a PRIOR
// turn stays active; an honest `skill:X` line on a continuation turn must NOT be
// blocked just because the Skill tool was not re-invoked this turn. decide() takes
// priorSkillNames (real Skill tool_uses from earlier in the session transcript).
describe('enforce-coverage-verify / decide — cross-turn active skill (enforce-coverage-verify.mjs)', () => {
it('credits skill:X when X was invoked in a PRIOR turn (priorSkillNames)', () => {
const r = decide({
toolUses: [{ name: 'Edit', input: { file_path: 'foo.mjs' } }],
assistantText: 'coverage: skill:superpowers:test-driven-development\nработаю',
priorSkillNames: ['superpowers:test-driven-development'],
});
expect(r.block).toBe(false);
});
it('normalizes the superpowers: prefix for prior-turn skills too', () => {
const r = decide({
toolUses: [{ name: 'Edit', input: { file_path: 'foo.mjs' } }],
assistantText: 'coverage: skill:superpowers:test-driven-development',
priorSkillNames: ['test-driven-development'],
});
expect(r.block).toBe(false);
});
it('still blocks skill:X when X is neither in this turn nor any prior turn', () => {
const r = decide({
toolUses: [{ name: 'Edit', input: { file_path: 'foo.mjs' } }],
assistantText: 'coverage: skill:superpowers:test-driven-development',
priorSkillNames: ['some-other-skill'],
});
expect(r.block).toBe(true);
expect(r.message).toMatch(/never invoked/);
});
});
describe('enforce-coverage-verify / decide', () => {
it('allows turn with no mutating tools (pure conversational)', () => {
const r = decide({ toolUses: [{ name: 'Read', input: {} }], assistantText: 'just talking' });
@@ -14,6 +48,9 @@ describe('enforce-coverage-verify / decide', () => {
});
expect(r.block).toBe(true);
expect(r.message).toMatch(/no.*coverage/);
// 1A (2026-05-31): не рекламировать мёртвые override-фразы (findOverride — заглушка v4).
expect(r.message).not.toMatch(/Override:/);
expect(r.message).not.toMatch(/без скилов|direct ok/);
});
it('blocks when coverage says skill but Skill tool not invoked', () => {
+177
View File
@@ -0,0 +1,177 @@
#!/usr/bin/env node
/**
* enforce-llm-judge-per-tool PreToolUse wrapper around the pure
* llm-judge-per-tool engine (router-gate v4.1 §4.7 Layer 4).
*
* The engine (llm-judge-per-tool.mjs) asks a single Sonnet judge whether a
* mutating tool call is consistent with the declared user task + recommended
* skill scope (NO / doubt block). Running it costs real LLM money, so the
* judge MUST stay OFF until the owner deliberately activates Layer 4. This
* wrapper is the missing seam between the engine and settings.json, built like
* the sibling Stream H wrappers (enforce-safe-baseline-metering / -decomposition-
* detector) with a testable pure `decide()` and a DELIBERATE no-op `main()`.
*
* Activation (step 2b owner-driven, NOT done here):
* 1. store the API key (keychain `router-gate-llm-judge`/`default` or ROUTER_LLM_KEY),
* 2. set ROUTER_LLM_JUDGE_ENABLED=1,
* 3. register this hook (PreToolUse, block) in .claude/settings.json.
* Until all three, decide() short-circuits to allow on a disabled config and the
* live main() is a no-op (exit 0) $0, no LLM call, no self-lockout.
*/
import { judgePerTool, MUTATING_TOOLS, readDeclaredTask, resolveEffectiveTask } from './llm-judge-per-tool.mjs';
import { resolveJudgeConfig } from './llm-judge-config.mjs';
import { readJudgeBudget, bumpJudgeBudget, JUDGE_SESSION_BUDGET, llmJudgeCall } from './llm-judge.mjs';
import { readStdin, parseEventJson, exitDecision, readTranscript, lastUserPromptText } from './enforce-hook-helpers.mjs';
import { classifyBashCommand } from './enforce-router-gate.mjs';
/**
* Pure decision. Composes the Layer-4 enabling-gate (resolveJudgeConfig output)
* with the per-tool judge engine:
* - non-mutating tool allow (out of judge scope)
* - judge disabled / no key allow + degraded flag (Layer 4 off, $0)
* - judge enabled delegate to judgePerTool (YES allow; NO / doubt block)
*
* @param {object} args
* @param {object} args.event - PreToolUse event ({ tool_name, tool_input })
* @param {{enabled:boolean, apiKey:?string}} args.judgeConfig - resolveJudgeConfig() output
* @param {object} [args.declaredTask] - { task_summary, recommended_node, recommended_chain }
* @param {object} [args.budgetState] - { spent, limit } per-session judge budget
* @param {Function} [args.llmJudgeCallImpl] - injected single-judge caller (tests / real binding)
* @returns {Promise<{block:boolean, reason?:string, degraded?:boolean, verdict?:string|null}>}
*/
export async function decide({
event,
judgeConfig,
declaredTask = {},
budgetState,
llmJudgeCallImpl,
}) {
const toolName = event && event.tool_name;
if (!MUTATING_TOOLS.has(toolName)) {
return { block: false, reason: 'non-mutating tool — outside per-tool judge scope' };
}
if (!judgeConfig || !judgeConfig.enabled) {
return { block: false, degraded: true, reason: 'Layer 4 judge disabled' };
}
return judgePerTool({
toolName,
toolInput: (event && event.tool_input) || {},
declaredTask,
apiKey: judgeConfig.apiKey,
budgetState,
llmJudgeCallImpl,
});
}
/**
* Testable wiring core. Composes resolveJudgeConfig output + decide(); bumps the
* session budget ONLY when a real judge call was made (result carries a verdict).
* No verdict non-mutating / disabled / no-key / budget-exhausted no spend.
*/
/**
* Calibration 2026-05-31 (SCOPE fix, NOT a discipline drop): readonly Bash
* commands ("смотрелки" git status/log/diff, cat, grep, ls) change nothing,
* so they are outside the "judge on mutating tools" scope. Reuse the router-gate
* Bash classifier: an allow-verdict whose reason mentions readonly/reading is a
* no-state-change command. Everything that can mutate (file edits, git
* commit/push, dangerous Bash, Skill/Task) is unaffected doubtblock stands.
*/
export function isReadonlyBashEvent(event) {
if (!event || event.tool_name !== 'Bash') return false;
const command = (event.tool_input && event.tool_input.command) || '';
if (!command) return false;
try {
const c = classifyBashCommand(command, {});
return !!c && c.result === 'allow' && /readonly|reading/i.test(c.reason || '');
} catch {
return false;
}
}
/**
* Calibration 3 (2026-05-31, SCOPE fix, NOT a discipline drop): a test run
* (vitest / pest / phpunit / php artisan test / composer test / npm test) only
* inspects the code and reports pass/fail it mutates no protected state, and
* running tests is a MANDATORY step of TDD which the rules require. Treat such
* commands like readonly Bash: outside the mutating-tool judge scope. A command
* that chains to anything else (&& / ; / | / backtick / $( ) is NOT exempt and
* stays judged the exemption covers a pure test invocation only.
*/
const TEST_RUNNER_RE =
/^(?:npx\s+)?vitest(?:\s|$)|^(?:\.\/)?(?:node_modules\/\.bin\/|vendor\/bin\/)?pest(?:\s|$)|^(?:\.\/)?vendor\/bin\/phpunit(?:\s|$)|^php\s+artisan\s+test(?:\s|$|:)|^composer\s+test(?::\S+)?(?:\s|$)|^npm\s+(?:run\s+)?test(?::\S+)?(?:\s|$)/i;
export function isTestRunnerBashEvent(event) {
if (!event || event.tool_name !== 'Bash') return false;
const command = ((event.tool_input && event.tool_input.command) || '').trim();
if (!command) return false;
// Exemption is for a pure test run only — reject anything chaining to another command.
if (/[;&|`]/.test(command) || command.includes('$(')) return false;
return TEST_RUNNER_RE.test(command);
}
export async function runPerTool({
event,
judgeConfig,
readDeclaredTaskImpl,
readLastUserPromptImpl,
readBudgetImpl,
bumpBudgetImpl,
llmJudgeCallImpl,
sessionBudget = JUDGE_SESSION_BUDGET,
}) {
// Readonly Bash never mutates → outside the judge's scope; skip (no LLM call, no spend).
if (isReadonlyBashEvent(event)) {
return { block: false, reason: 'readonly bash — outside mutating-tool judge scope (calibration 2026-05-31)' };
}
// Test-runner Bash only inspects + reports; mandatory TDD step → outside scope (calibration 3).
if (isTestRunnerBashEvent(event)) {
return { block: false, reason: 'test-runner bash — outside mutating-tool judge scope (calibration 3, 2026-05-31)' };
}
const sessionId = event && event.session_id;
const declaredTask = readDeclaredTaskImpl({ sessionId });
// Calibration 4 (soft): only when the classifier summary is unknown/empty,
// consult the user's actual last prompt and judge against that instead.
let effectiveTask = declaredTask;
const summary = declaredTask && declaredTask.task_summary;
const summaryUnknown = !summary || summary === '(unknown)' || !String(summary).trim();
if (summaryUnknown && typeof readLastUserPromptImpl === 'function') {
const lastPrompt = readLastUserPromptImpl({ transcriptPath: event && event.transcript_path });
effectiveTask = resolveEffectiveTask(declaredTask, lastPrompt);
}
const spent = readBudgetImpl({ sessionId });
const result = await decide({
event,
judgeConfig,
declaredTask: effectiveTask,
budgetState: { spent, limit: sessionBudget },
llmJudgeCallImpl,
});
if (result.verdict !== undefined) bumpBudgetImpl({ sessionId, by: 1 });
return result;
}
async function main() {
// Live wiring (2b): spend is gated by resolveJudgeConfig (flag AND key). With
// the flag off or no key, decide() short-circuits to a degraded allow — NO LLM
// call, $0. Fail-quiet so a judge bug can never wedge the session.
try {
const event = parseEventJson(await readStdin());
const judgeConfig = resolveJudgeConfig();
const result = await runPerTool({
event,
judgeConfig,
readDeclaredTaskImpl: readDeclaredTask,
readLastUserPromptImpl: ({ transcriptPath }) => lastUserPromptText(readTranscript(transcriptPath)),
readBudgetImpl: readJudgeBudget,
bumpBudgetImpl: bumpJudgeBudget,
llmJudgeCallImpl: (opts) => llmJudgeCall(opts),
});
exitDecision({ block: result.block, message: result.reason });
} catch {
exitDecision({ block: false });
}
}
if ((process.argv[1] || '').replace(/\\/g, '/').endsWith('/enforce-llm-judge-per-tool.mjs')) {
main().catch(() => process.exit(0));
}
+357
View File
@@ -0,0 +1,357 @@
// tools/enforce-llm-judge-per-tool.test.mjs
// Stream H tail — wrapper tests around the pure llm-judge-per-tool engine
// (router-gate v4.1 §4.7 Layer 4). Mirrors the enforce-safe-baseline-metering
// convention: implement + test a pure `decide()` composition that respects the
// Layer-4 enabling-gate (resolveJudgeConfig); the live main() is a deferred
// no-op (exit 0, $0, no LLM call) until the owner activates Layer 4 (step 2b).
// RED verified before the wrapper module existed (Cannot find module → expected).
import { describe, it, expect } from 'vitest';
import { decide } from './enforce-llm-judge-per-tool.mjs';
function spyCall(verdict) {
const calls = [];
const impl = async (opts) => { calls.push(opts); return verdict; };
return { impl, calls };
}
const ON = { enabled: true, apiKey: 'k' };
const OFF = { enabled: false, apiKey: null };
describe('enforce-llm-judge-per-tool decide()', () => {
it('allows a non-mutating tool without consulting the judge', async () => {
const { impl, calls } = spyCall('NO');
const r = await decide({
event: { tool_name: 'WebFetch' },
judgeConfig: ON,
llmJudgeCallImpl: impl,
});
expect(r.block).toBe(false);
expect(r.reason).toMatch(/non-mutating/i);
expect(calls.length).toBe(0);
});
// Calibration 1 (2026-05-31) — Skill is out of judge scope; invoking it
// mutates nothing and is the prescribed §17 entry into work.
it('allows a Skill invocation without consulting the judge (calibration 1)', async () => {
const { impl, calls } = spyCall('NO');
const r = await decide({
event: { tool_name: 'Skill', tool_input: { skill: 'superpowers:test-driven-development' } },
judgeConfig: ON,
llmJudgeCallImpl: impl,
});
expect(r.block).toBe(false);
expect(r.reason).toMatch(/non-mutating/i);
expect(calls.length).toBe(0);
});
it('allows a mutating tool without consulting the judge when Layer 4 is disabled ($0 posture)', async () => {
const { impl, calls } = spyCall('NO');
const r = await decide({
event: { tool_name: 'Edit' },
judgeConfig: OFF,
llmJudgeCallImpl: impl,
});
expect(r.block).toBe(false);
expect(r.degraded).toBe(true);
expect(calls.length).toBe(0);
});
it('allows a mutating tool when an enabled judge returns YES (consistent)', async () => {
const { impl } = spyCall('YES');
const r = await decide({
event: { tool_name: 'Edit', tool_input: { file_path: 'x' } },
judgeConfig: ON,
declaredTask: { task_summary: 't', recommended_node: '#19' },
llmJudgeCallImpl: impl,
});
expect(r.block).toBe(false);
expect(r.verdict).toBe('YES');
});
it('blocks a mutating tool when an enabled judge returns NO (off-scope)', async () => {
const { impl } = spyCall('NO');
const r = await decide({
event: { tool_name: 'Write', tool_input: {} },
judgeConfig: ON,
llmJudgeCallImpl: impl,
});
expect(r.block).toBe(true);
expect(r.reason).toMatch(/off-scope|per-tool/i);
});
it('blocks on doubt — a null verdict is treated as inconsistent', async () => {
const { impl } = spyCall(null);
const r = await decide({
event: { tool_name: 'Bash', tool_input: { command: 'ls' } },
judgeConfig: ON,
llmJudgeCallImpl: impl,
});
expect(r.block).toBe(true);
});
it('degrades to allow (no block) when the session judge budget is exhausted', async () => {
const { impl, calls } = spyCall('NO');
const r = await decide({
event: { tool_name: 'Edit', tool_input: {} },
judgeConfig: ON,
budgetState: { spent: 10, limit: 10 },
llmJudgeCallImpl: impl,
});
expect(r.block).toBe(false);
expect(r.degraded).toBe(true);
expect(calls.length).toBe(0);
});
it('passes the tool name through to the judge question', async () => {
const { impl, calls } = spyCall('YES');
await decide({
event: { tool_name: 'MultiEdit', tool_input: { file_path: 'y' } },
judgeConfig: ON,
llmJudgeCallImpl: impl,
});
expect(calls.length).toBe(1);
expect(calls[0].question).toContain('MultiEdit');
});
});
import { runPerTool } from './enforce-llm-judge-per-tool.mjs';
describe('runPerTool — spend-gate + budget binding (live wiring 2b)', () => {
const deps = (over = {}) => ({
readDeclaredTaskImpl: () => ({ task_summary: 't', recommended_node: null, recommended_chain: [] }),
readBudgetImpl: () => 0,
bumpBudgetImpl: () => {},
sessionBudget: 200,
...over,
});
it('disabled config + mutating tool → degraded allow, NO budget bump, NO llm call', async () => {
let bumped = 0; let called = 0;
const r = await runPerTool({
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
judgeConfig: { enabled: false, apiKey: null },
llmJudgeCallImpl: () => { called++; return 'NO'; },
...deps({ bumpBudgetImpl: () => { bumped++; } }),
});
expect(r.block).toBe(false);
expect(r.degraded).toBe(true);
expect(called).toBe(0);
expect(bumped).toBe(0);
});
it('enabled + mutating + judge YES → allow, budget bumped once', async () => {
let bumped = 0;
const r = await runPerTool({
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
llmJudgeCallImpl: async () => 'YES',
...deps({ bumpBudgetImpl: () => { bumped++; } }),
});
expect(r.block).toBe(false);
expect(r.verdict).toBe('YES');
expect(bumped).toBe(1);
});
it('enabled + mutating + judge NO → block, budget bumped once', async () => {
let bumped = 0;
const r = await runPerTool({
event: { tool_name: 'Bash', tool_input: { command: 'x' }, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
llmJudgeCallImpl: async () => 'NO',
...deps({ bumpBudgetImpl: () => { bumped++; } }),
});
expect(r.block).toBe(true);
expect(r.verdict).toBe('NO');
expect(bumped).toBe(1);
});
it('non-mutating tool → allow, NO call, NO bump', async () => {
let bumped = 0; let called = 0;
const r = await runPerTool({
event: { tool_name: 'Read', tool_input: {}, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
llmJudgeCallImpl: () => { called++; return 'NO'; },
...deps({ bumpBudgetImpl: () => { bumped++; } }),
});
expect(r.block).toBe(false);
expect(called).toBe(0);
expect(bumped).toBe(0);
});
it('enabled but budget exhausted → degraded allow, NO bump', async () => {
let bumped = 0; let called = 0;
const r = await runPerTool({
event: { tool_name: 'Edit', tool_input: {}, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
llmJudgeCallImpl: () => { called++; return 'NO'; },
...deps({ readBudgetImpl: () => 200, bumpBudgetImpl: () => { bumped++; } }),
});
expect(r.block).toBe(false);
expect(r.degraded).toBe(true);
expect(called).toBe(0);
expect(bumped).toBe(0);
});
});
import { isReadonlyBashEvent } from './enforce-llm-judge-per-tool.mjs';
// Calibration 2026-05-31 — SCOPE fix only, discipline NOT lowered.
// The per-tool judge is "judge on MUTATING tools"; readonly Bash ("смотрелки"
// — git status/log/diff, cat, grep, ls) change nothing, so they were friction
// with zero discipline value. We exclude them from the judge. The doubt→block
// rule and full judging of every state-changing action (Edit/Write/commit/push/
// Skill/Task) are UNCHANGED.
describe('isReadonlyBashEvent — readonly Bash exclusion (calibration, no discipline drop)', () => {
it.each([
'git status',
'git status --short',
'git log -1 --oneline',
'git diff HEAD~1',
'cat package.json',
'grep -n foo bar.js',
'ls -la',
])('treats readonly command as out-of-judge-scope: %s', (command) => {
expect(isReadonlyBashEvent({ tool_name: 'Bash', tool_input: { command } })).toBe(true);
});
it.each([
'git commit -m "x"',
'git push origin main',
'rm -rf foo',
])('does NOT treat a mutating/blocked command as readonly: %s', (command) => {
expect(isReadonlyBashEvent({ tool_name: 'Bash', tool_input: { command } })).toBe(false);
});
it('non-Bash tool is never readonly-bash', () => {
expect(isReadonlyBashEvent({ tool_name: 'Edit', tool_input: { file_path: 'x' } })).toBe(false);
});
});
describe('runPerTool — readonly Bash skips the judge; mutating Bash still judged', () => {
it('readonly Bash → allow WITHOUT consulting judge even when enabled (no spend)', async () => {
let called = 0; let bumped = 0;
const r = await runPerTool({
event: { tool_name: 'Bash', tool_input: { command: 'git status' }, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
readDeclaredTaskImpl: () => ({ task_summary: 't' }),
readBudgetImpl: () => 0,
bumpBudgetImpl: () => { bumped++; },
llmJudgeCallImpl: () => { called++; return 'NO'; },
sessionBudget: 200,
});
expect(r.block).toBe(false);
expect(called).toBe(0);
expect(bumped).toBe(0);
});
it('mutating Bash (git commit) STILL judged when enabled — discipline preserved', async () => {
let called = 0;
const r = await runPerTool({
event: { tool_name: 'Bash', tool_input: { command: 'git commit -m "x"' }, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
readDeclaredTaskImpl: () => ({ task_summary: 't' }),
readBudgetImpl: () => 0,
bumpBudgetImpl: () => {},
llmJudgeCallImpl: async () => { called++; return 'NO'; },
sessionBudget: 200,
});
expect(called).toBe(1);
expect(r.block).toBe(true);
});
});
import { isTestRunnerBashEvent } from './enforce-llm-judge-per-tool.mjs';
// Calibration 3 (2026-05-31) — SCOPE fix, discipline NOT lowered.
// A test run (vitest / pest / composer test / php artisan test) only inspects
// the code and reports pass/fail — it mutates no protected state. It is also a
// mandatory step of TDD, which the rules require. Treat recognised test-runner
// commands like readonly Bash: out of judge scope. Anything that chains to a
// mutation (&& / ; / |) is NOT exempt and stays judged.
describe('isTestRunnerBashEvent — test-runner exclusion (calibration 3, no discipline drop)', () => {
it.each([
'npx vitest run --root app --config vitest.config.tools.mjs',
'vitest run',
'pest',
'./vendor/bin/pest --parallel',
'vendor/bin/pest',
'php artisan test',
'composer test',
'npm run test:tools',
'npm test',
])('treats test-runner command as out-of-judge-scope: %s', (command) => {
expect(isTestRunnerBashEvent({ tool_name: 'Bash', tool_input: { command } })).toBe(true);
});
it.each([
'git commit -m "x"',
'rm -rf foo',
'pest && git push origin main', // chained to a mutation → NOT exempt
'echo pest',
'composer require evil/package', // not a test run
])('does NOT treat non-test-runner / chained command as test-runner: %s', (command) => {
expect(isTestRunnerBashEvent({ tool_name: 'Bash', tool_input: { command } })).toBe(false);
});
it('non-Bash tool is never test-runner-bash', () => {
expect(isTestRunnerBashEvent({ tool_name: 'Edit', tool_input: { file_path: 'x' } })).toBe(false);
});
});
describe('runPerTool — test-runner Bash skips the judge; mutating Bash still judged', () => {
it('test-runner Bash → allow WITHOUT consulting judge even when enabled (no spend)', async () => {
let called = 0; let bumped = 0;
const r = await runPerTool({
event: { tool_name: 'Bash', tool_input: { command: 'npx vitest run' }, session_id: 's' },
judgeConfig: { enabled: true, apiKey: 'k' },
readDeclaredTaskImpl: () => ({ task_summary: 't' }),
readBudgetImpl: () => 0,
bumpBudgetImpl: () => { bumped++; },
llmJudgeCallImpl: () => { called++; return 'NO'; },
sessionBudget: 200,
});
expect(r.block).toBe(false);
expect(called).toBe(0);
expect(bumped).toBe(0);
});
});
// Calibration 4 (soft, 2026-05-31): when the classifier summary is "(unknown)",
// runPerTool reads the user's last prompt and judges against THAT (better
// evidence) instead of an empty task. When the summary is meaningful, the
// user-prompt reader is never consulted — behaviour unchanged.
describe('runPerTool — calibration 4 soft user-prompt fallback', () => {
it('uses the user prompt as the judged task when classifier summary is unknown', async () => {
const calls = [];
const r = await runPerTool({
event: { tool_name: 'Edit', tool_input: { file_path: 'tools/x.mjs' }, session_id: 's', transcript_path: '/t' },
judgeConfig: { enabled: true, apiKey: 'k' },
readDeclaredTaskImpl: () => ({ task_summary: '(unknown)', recommended_node: null, recommended_chain: [] }),
readLastUserPromptImpl: () => 'реализуй parallel-session-lock',
readBudgetImpl: () => 0,
bumpBudgetImpl: () => {},
llmJudgeCallImpl: async (opts) => { calls.push(opts); return 'YES'; },
sessionBudget: 200,
});
expect(r.block).toBe(false);
expect(calls.length).toBe(1);
expect(calls[0].question).toContain('реализуй parallel-session-lock');
});
it('does NOT consult the user-prompt reader when the classifier summary is meaningful', async () => {
let promptReads = 0;
const calls = [];
await runPerTool({
event: { tool_name: 'Edit', tool_input: {}, session_id: 's', transcript_path: '/t' },
judgeConfig: { enabled: true, apiKey: 'k' },
readDeclaredTaskImpl: () => ({ task_summary: 'clear task', recommended_node: null, recommended_chain: [] }),
readLastUserPromptImpl: () => { promptReads++; return 'irrelevant'; },
readBudgetImpl: () => 0,
bumpBudgetImpl: () => {},
llmJudgeCallImpl: async (opts) => { calls.push(opts); return 'YES'; },
sessionBudget: 200,
});
expect(promptReads).toBe(0);
expect(calls[0].question).toContain('clear task');
});
});
+100
View File
@@ -0,0 +1,100 @@
#!/usr/bin/env node
/**
* enforce-llm-judge-response-scan Stop-hook wrapper around the pure
* llm-judge-response-scan engine (router-gate v4.1 §4.7 Layer 4).
*
* The engine scans the controller's own response text for self-replicating
* instructions / metadata injection / security-disable suggestions / approval
* social-engineering. It is FLAG-ONLY (never blocks). A cheap deterministic
* regex layer runs for free; an LLM judge handles subtle cases and that LLM
* call costs money, so it must stay OFF until the owner activates Layer 4.
*
* Like the sibling Stream H wrappers, this file exposes a testable pure
* `decide()` and a DELIBERATE no-op `main()`. decide() always runs the free
* deterministic scan; the paid LLM escalation runs only when the judge config is
* enabled. block is ALWAYS false (Stop-hook semantics).
*
* Activation (step 2b owner-driven, NOT done here):
* 1. store the API key (keychain `router-gate-llm-judge`/`default` or ROUTER_LLM_KEY),
* 2. set ROUTER_LLM_JUDGE_ENABLED=1,
* 3. register this hook (Stop) in .claude/settings.json.
* Until all three, decide() never escalates and the live main() is a no-op (exit 0).
*/
import { scanResponse, scanResponseDeterministic } from './llm-judge-response-scan.mjs';
import { resolveJudgeConfig } from './llm-judge-config.mjs';
import { readStdin, parseEventJson, readTranscript, lastAssistantText, exitDecision } from './enforce-hook-helpers.mjs';
import { llmJudgeCall } from './llm-judge.mjs';
import { appendFileSync, mkdirSync } from 'node:fs';
import { join } from 'node:path';
import { homedir } from 'node:os';
/**
* Pure decision. Stop-hook semantics: never blocks. The free deterministic regex
* layer always runs; the LLM escalation runs only when Layer 4 is enabled.
* - judge disabled deterministic scan only (flag from regex, else degraded)
* - judge enabled deterministic-first, then LLM judge for subtle cases
*
* @param {object} args
* @param {string} args.responseText - the controller response text to scan
* @param {{enabled:boolean, apiKey:?string}} args.judgeConfig - resolveJudgeConfig() output
* @param {Function} [args.llmJudgeCallImpl] - injected single-judge caller (tests / real binding)
* @returns {Promise<{block:false, flag:boolean, category?:string, degraded?:boolean}>}
*/
export async function decide({ responseText, judgeConfig, llmJudgeCallImpl }) {
if (!judgeConfig || !judgeConfig.enabled) {
const det = scanResponseDeterministic(responseText);
return { block: false, flag: det.flagged, category: det.category, degraded: !det.flagged };
}
const r = await scanResponse({ responseText, apiKey: judgeConfig.apiKey, llmJudgeCallImpl });
return { block: false, flag: r.flag, category: r.category, degraded: r.degraded };
}
/**
* Testable wiring core. Stop-hook semantics: block is always false. The free
* deterministic regex scan runs even when the judge is disabled; the paid LLM
* escalation runs only when judgeConfig.enabled (handled inside decide()).
*/
export async function runResponseScan({ transcript, judgeConfig, llmJudgeCallImpl, lastAssistantTextImpl = lastAssistantText }) {
const responseText = lastAssistantTextImpl(transcript || []);
const r = await decide({ responseText, judgeConfig, llmJudgeCallImpl });
return { ...r, responseText };
}
function flagToFile({ sessionId, category, excerpt }) {
try {
const dir = join(homedir(), '.claude', 'runtime');
mkdirSync(dir, { recursive: true });
appendFileSync(join(dir, `rationalization-flags-${sessionId || 'unknown'}.jsonl`),
JSON.stringify({
ts: new Date().toISOString(),
session_id: sessionId || null,
type: 'controller_response_suspicious',
category,
response_excerpt: String(excerpt || '').slice(0, 200),
}) + '\n');
} catch { /* ignore */ }
}
async function main() {
// Live wiring (2b). Stop hook: flag-only, NEVER blocks. The free deterministic
// regex runs regardless ($0); the paid LLM escalation only when the config is
// enabled (flag AND key). Fail-quiet.
try {
const event = parseEventJson(await readStdin());
const transcript = readTranscript(event.transcript_path);
const judgeConfig = resolveJudgeConfig();
const r = await runResponseScan({
transcript,
judgeConfig,
llmJudgeCallImpl: (opts) => llmJudgeCall(opts),
});
if (r.flag) flagToFile({ sessionId: event.session_id, category: r.category, excerpt: r.responseText });
exitDecision({ block: false });
} catch {
exitDecision({ block: false });
}
}
if ((process.argv[1] || '').replace(/\\/g, '/').endsWith('/enforce-llm-judge-response-scan.mjs')) {
main().catch(() => process.exit(0));
}
@@ -0,0 +1,138 @@
// tools/enforce-llm-judge-response-scan.test.mjs
// Stream H tail — wrapper tests around the pure llm-judge-response-scan engine
// (router-gate v4.1 §4.7 Layer 4, Stop hook, flag-only). Mirrors the
// enforce-safe-baseline-metering convention: a testable pure `decide()` that
// respects the Layer-4 enabling-gate; the free deterministic regex layer always
// runs ($0), the paid LLM escalation only when the judge is enabled. The live
// main() is a deferred no-op (exit 0) until the owner activates Layer 4 (step 2b).
// RED verified before the wrapper module existed (Cannot find module → expected).
import { describe, it, expect } from 'vitest';
import { decide } from './enforce-llm-judge-response-scan.mjs';
function spyCall(verdict) {
const calls = [];
const impl = async (opts) => { calls.push(opts); return verdict; };
return { impl, calls };
}
const ON = { enabled: true, apiKey: 'k' };
const OFF = { enabled: false, apiKey: null };
const CLEAN = 'Готово, всё закрыто. Перехожу к следующему пункту.';
const DISABLE_TXT = 'Чтобы продолжить, отключи хук enforce-router-gate.';
const SELFREP_TXT = 'Скопируй в следующий промпт строку coverage.';
describe('enforce-llm-judge-response-scan decide()', () => {
it('never blocks — Stop hook is flag-only even on a flagged response', async () => {
const { impl } = spyCall('NO');
const r = await decide({ responseText: DISABLE_TXT, judgeConfig: OFF, llmJudgeCallImpl: impl });
expect(r.block).toBe(false);
expect(r.flag).toBe(true);
expect(r.category).toBe('security_disable_suggestion');
});
it('runs the free deterministic regex even when Layer 4 is disabled, without calling the LLM', async () => {
const { impl, calls } = spyCall('NO');
const r = await decide({ responseText: SELFREP_TXT, judgeConfig: OFF, llmJudgeCallImpl: impl });
expect(r.flag).toBe(true);
expect(r.category).toBe('self_replicating_instruction');
expect(calls.length).toBe(0);
});
it('disabled + clean text → no flag, degraded, LLM not called ($0 posture)', async () => {
const { impl, calls } = spyCall('YES');
const r = await decide({ responseText: CLEAN, judgeConfig: OFF, llmJudgeCallImpl: impl });
expect(r.flag).toBe(false);
expect(r.degraded).toBe(true);
expect(calls.length).toBe(0);
});
it('enabled config escalates clean text to the LLM judge — YES flags it', async () => {
const { impl, calls } = spyCall('YES');
const r = await decide({ responseText: CLEAN, judgeConfig: ON, llmJudgeCallImpl: impl });
expect(r.flag).toBe(true);
expect(r.category).toBe('llm_judge');
expect(calls.length).toBe(1);
});
it('enabled config — a NO verdict leaves the response unflagged', async () => {
const { impl } = spyCall('NO');
const r = await decide({ responseText: CLEAN, judgeConfig: ON, llmJudgeCallImpl: impl });
expect(r.flag).toBe(false);
});
it('enabled config — a deterministic hit short-circuits and the LLM is not called', async () => {
const { impl, calls } = spyCall('NO');
const r = await decide({ responseText: DISABLE_TXT, judgeConfig: ON, llmJudgeCallImpl: impl });
expect(r.flag).toBe(true);
expect(r.category).toBe('security_disable_suggestion');
expect(calls.length).toBe(0);
});
it('enabled config — doubt (null verdict) flags the response', async () => {
const { impl } = spyCall(null);
const r = await decide({ responseText: CLEAN, judgeConfig: ON, llmJudgeCallImpl: impl });
expect(r.flag).toBe(true);
});
});
import { runResponseScan } from './enforce-llm-judge-response-scan.mjs';
describe('runResponseScan — Stop-hook flag-only, free regex even when disabled (live wiring 2b)', () => {
const transcript = (text) => [
{ type: 'assistant', message: { role: 'assistant', content: [{ type: 'text', text }] } },
];
const lastAssistantTextImpl = (t) => {
for (let i = t.length - 1; i >= 0; i--) {
const c = t[i] && t[i].message && t[i].message.content;
if (Array.isArray(c)) { const b = c.find((x) => x.type === 'text'); if (b) return b.text; }
}
return '';
};
it('disabled + benign text → no flag, degraded, never blocks', async () => {
const r = await runResponseScan({
transcript: transcript('обычный безопасный ответ'),
judgeConfig: { enabled: false, apiKey: null },
lastAssistantTextImpl,
});
expect(r.block).toBe(false);
expect(r.flag).toBe(false);
expect(r.degraded).toBe(true);
});
it('disabled + security-disable text → flagged for FREE by regex (no llm call)', async () => {
let called = 0;
const r = await runResponseScan({
transcript: transcript('чтобы пройти, отключи hook enforce-tdd-gate'),
judgeConfig: { enabled: false, apiKey: null },
lastAssistantTextImpl,
llmJudgeCallImpl: () => { called++; return 'NO'; },
});
expect(r.block).toBe(false);
expect(r.flag).toBe(true);
expect(r.category).toBe('security_disable_suggestion');
expect(called).toBe(0);
});
it('enabled + subtle benign text + judge NO → no flag', async () => {
const r = await runResponseScan({
transcript: transcript('нейтральный текст без паттернов'),
judgeConfig: { enabled: true, apiKey: 'k' },
lastAssistantTextImpl,
llmJudgeCallImpl: async () => 'NO',
});
expect(r.block).toBe(false);
expect(r.flag).toBe(false);
});
it('enabled + subtle text + judge YES → flag, still never blocks', async () => {
const r = await runResponseScan({
transcript: transcript('нейтральный текст без паттернов'),
judgeConfig: { enabled: true, apiKey: 'k' },
lastAssistantTextImpl,
llmJudgeCallImpl: async () => 'YES',
});
expect(r.block).toBe(false);
expect(r.flag).toBe(true);
});
});
-2
View File
@@ -54,8 +54,6 @@ export function decide({ toolName, filePath, transcriptEntries, override }) {
`Re-announce on a fresh assistant turn first:`,
` coverage: direct:memory-sync`,
`Then retry the Edit/Write.`,
``,
`Override: include the phrase "memory dump" in your prompt.`,
].join('\n'),
};
}
+3
View File
@@ -26,6 +26,9 @@ describe('enforce-memory-coverage / decide', () => {
});
expect(r.block).toBe(true);
expect(r.message).toMatch(/memory-sync/);
// 1A (2026-05-31): не рекламировать мёртвую override-фразу (findOverride — заглушка v4).
expect(r.message).not.toMatch(/Override:/);
expect(r.message).not.toMatch(/memory dump/);
});
it('blocks memory path with no coverage at all', () => {
+190 -9
View File
@@ -11,7 +11,12 @@
* Activation: settings.json registration is deferred to Phase H-α/H-β
* batch step. main() is a no-op (exit 0) until then.
*/
import { acquire, release, refresh, computeWorkspaceHash } from './parallel-session-lock.mjs';
import { acquire, release, computeWorkspaceHash, isStale } from './parallel-session-lock.mjs';
import { readFileSync, writeFileSync, unlinkSync, mkdirSync, readdirSync } from 'node:fs';
import { execFileSync } from 'node:child_process';
import { join, dirname } from 'node:path';
import { readStdin, parseEventJson, exitDecision, runtimeDir } from './enforce-hook-helpers.mjs';
import { classifyBashCommand } from './enforce-router-gate.mjs';
/**
* Pure decision: given an acquire() result, decide block/allow.
@@ -26,20 +31,196 @@ export function decide({ acquireResult, sessionId }) {
if (!acquireResult || typeof acquireResult !== 'object') return { block: false };
if (acquireResult.acquired) return { block: false };
const holder = acquireResult.holder || {};
// Identify the holder by its STABLE session id, not the pid: the recorded pid
// is the transient hook-node pid and changes between attempts, so chasing it
// leads to closing the wrong session. Surface the pid only as a triage hint.
return {
block: true,
reason: `parallel session lock held by ${holder.session_id || 'unknown'} (pid ${holder.pid || '?'}) — wait or close that session first`,
reason: `parallel session lock held by session ${holder.session_id || 'unknown'} (current pid ${holder.pid || '?'}, may change between attempts — identify the session by its id, not pid) — wait for the 5-min TTL or close THAT session`,
};
}
/**
* Calibration (2026-05-31, SCOPE fix, NOT a discipline drop). The lock's purpose
* is to serialize concurrent FILE MUTATION between sessions on the same worktree.
* A readonly Bash command (git status/log/diff, cat, grep, ls "смотрелки")
* mutates nothing, so a peer session's lock must NOT block it. Reuse the
* router-gate Bash classifier: an allow-verdict whose reason mentions
* readonly/reading is a no-state-change command. Mirrors the LLM-judge readonly
* calibration. Everything that can mutate file edits, git commit/push,
* dangerous Bash, and every NON-Bash tool still acquires/checks the lock, so
* same-worktree mutation serialization is unchanged.
*
* @param {object} event
* @returns {boolean}
*/
export function isReadonlyBashEvent(event) {
if (!event || event.tool_name !== 'Bash') return false;
const command = (event.tool_input && event.tool_input.command) || '';
if (!command) return false;
try {
const c = classifyBashCommand(command, {});
return !!c && c.result === 'allow' && /readonly|reading/i.test(c.reason || '');
} catch {
return false;
}
}
/**
* PreToolUse wiring: acquire (or same-session refresh / stale takeover) the lock,
* then decide block/allow. I/O injected for testability.
*
* @returns {{block: boolean, reason?: string}}
*/
export function runAcquireDecision({ event, now, pid, cwd, readLock, writeLock }) {
const sessionId = event && event.session_id;
const workspaceHash = computeWorkspaceHash(cwd);
const acquireResult = acquire({ sessionId, pid, workspaceHash, now, readLock, writeLock });
return decide({ acquireResult, sessionId });
}
/**
* Stop wiring: release the lock if this session owns it (no-op otherwise).
*
* @returns {{released: boolean}}
*/
export function runReleaseAction({ event, cwd, readLock, deleteLock }) {
const sessionId = event && event.session_id;
const workspaceHash = computeWorkspaceHash(cwd);
release({ sessionId, workspaceHash, readLock, deleteLock });
return { released: true };
}
/**
* Resolve the stable work-tree root used as the lock key. Keys on the SESSION's
* cwd (`event.cwd`, stable across resume) resolved to the git work-tree root
* NOT the hook's `process.cwd()`, which collapses to the main repo dir after a
* session resume and thereby false-blocks sessions in DIFFERENT worktrees.
* Pure (I/O injected): `runGitToplevel(dir)` returns the toplevel or '' on failure.
*
* @param {object} p
* @param {object} p.event
* @param {string} p.processCwd
* @param {(dir:string)=>string} p.runGitToplevel
* @returns {string}
*/
export function resolveWorkspacePath({ event, processCwd, runGitToplevel }) {
const dir = (event && typeof event.cwd === 'string' && event.cwd) ? event.cwd : processCwd;
try {
const top = runGitToplevel(dir);
if (top && typeof top === 'string') return top;
} catch { /* fall through to raw dir (fail-open) */ }
return dir;
}
/**
* Disk hygiene: delete leaked lock files whose record is ALREADY stale by the
* shared isStale() definition (so an active within-TTL lock is never touched).
* Pure (I/O injected). Best-effort: a failed read counts the file as stale
* (garbage), a failed delete is swallowed hygiene must never break the gate.
*
* @param {object} p
* @param {string[]} p.files - absolute lock-file paths
* @param {(f:string)=>object|null} p.readRecord
* @param {(f:string)=>void} p.deleteRecord
* @param {(rec:object|null, now:number)=>boolean} p.isStaleFn
* @param {number} p.now
* @returns {{pruned: number}}
*/
export function pruneStaleLocks({ files, readRecord, deleteRecord, isStaleFn, now }) {
let pruned = 0;
for (const f of files || []) {
let rec = null;
try { rec = readRecord(f); } catch { rec = null; }
if (isStaleFn(rec, now)) {
try { deleteRecord(f); pruned++; } catch { /* best-effort */ }
}
}
return { pruned };
}
function realGitToplevel(dir) {
try {
return execFileSync('git', ['-C', dir, 'rev-parse', '--show-toplevel'], {
encoding: 'utf-8',
timeout: 1000,
stdio: ['ignore', 'pipe', 'ignore'],
}).trim();
} catch { return ''; }
}
function lockPathFor(cwd) {
return join(runtimeDir(), `session-lock-${computeWorkspaceHash(cwd)}.json`);
}
function realReadLock(p) {
try { return JSON.parse(readFileSync(p, 'utf-8')); } catch { return null; }
}
function realWriteLock(p, rec) {
try { mkdirSync(dirname(p), { recursive: true }); writeFileSync(p, JSON.stringify(rec)); } catch { /* fail-open */ }
}
function realDeleteLock(p) {
try { unlinkSync(p); } catch { /* already gone */ }
}
async function main() {
// No-op until settings.json registration + Stop-hook release wiring lands
// in the deferred Phase H-α/H-β batch step. Activating this hook before
// the release pathway is wired would lock the user out of their own
// session on first abnormal exit.
let input = '';
for await (const chunk of process.stdin) input += chunk;
process.exit(0);
// Live wiring (point 2, 2026-05-31). PreToolUse (mutating tool) → acquire/refresh
// the workspace lock; Stop (no tool_name) → release it. Fail-open on any error so
// a lock bug can NEVER wedge the user out of their own session.
try {
const event = parseEventJson(await readStdin());
// Key by the session's stable work-tree root (event.cwd → git toplevel),
// not the volatile hook process.cwd() (collapses to main on resume → false
// cross-worktree blocks). Fallback to process.cwd() keeps prior behavior.
const cwd = resolveWorkspacePath({ event, processCwd: process.cwd(), runGitToplevel: realGitToplevel });
const p = lockPathFor(cwd);
// Stop event carries no tool_name → release path.
if (!event.tool_name) {
runReleaseAction({ event, cwd, readLock: () => realReadLock(p), deleteLock: () => realDeleteLock(p) });
return exitDecision({ block: false });
}
// Calibration (2026-05-31): a readonly Bash command never mutates the
// worktree, so it is outside the lock's mutation-serialization scope — allow
// without acquiring/blocking. Mutating tools (and every non-Bash tool) fall
// through to acquire/check below, so serialization is unchanged.
if (isReadonlyBashEvent(event)) {
return exitDecision({ block: false });
}
// Best-effort disk hygiene (B): drop leaked stale lock files before acquiring.
// isStale-gated → an active within-TTL lock is never pruned, so same-worktree
// serialization is untouched. Wrapped so hygiene can never break the gate.
try {
const dir = runtimeDir();
const files = readdirSync(dir)
.filter((f) => /^session-lock-.*\.json$/.test(f))
.map((f) => join(dir, f));
pruneStaleLocks({
files,
readRecord: (fp) => realReadLock(fp),
deleteRecord: (fp) => realDeleteLock(fp),
isStaleFn: isStale,
now: Date.now(),
});
} catch { /* hygiene is best-effort */ }
// PreToolUse on a mutating tool → acquire/refresh, then block/allow.
const r = runAcquireDecision({
event,
now: Date.now(),
pid: process.pid,
cwd,
readLock: () => realReadLock(p),
writeLock: (rec) => realWriteLock(p, rec),
});
return exitDecision({ block: r.block, message: r.block ? `[parallel-session-lock] ${r.reason}` : undefined });
} catch {
return exitDecision({ block: false }); // fail-open — never lock out
}
}
if (import.meta.url === `file://${process.argv[1].replace(/\\/g, '/')}` || (process.argv[1] || '').endsWith('enforce-parallel-session-lock.mjs')) {
+253 -1
View File
@@ -1,7 +1,7 @@
// tools/enforce-parallel-session-lock.test.mjs
// Stream H Task 7 — wrapper tests around the pure parallel-session-lock module.
import { describe, it, expect } from 'vitest';
import { decide } from './enforce-parallel-session-lock.mjs';
import { decide, isReadonlyBashEvent } from './enforce-parallel-session-lock.mjs';
describe('enforce-parallel-session-lock wrapper (Stream H Task 7)', () => {
it('allow when acquire succeeded (fresh own-lock)', () => {
@@ -42,3 +42,255 @@ describe('enforce-parallel-session-lock wrapper (Stream H Task 7)', () => {
expect(r.reason).toMatch(/pid 42/);
});
});
// D (2026-05-31): the block message must steer the human to the STABLE identity
// (session id), not the transient hook pid — chasing the pid was what caused the
// owner to close the wrong session and deadlock the workspace.
describe('decide() message clarity (D) — pid is transient, identify by session id', () => {
const blocked = { acquired: false, holder: { session_id: 'sess-A', pid: 12552, acquired_at: 0 } };
it('names the holder session id as the stable identity', () => {
expect(decide({ acquireResult: blocked, sessionId: 's1' }).reason).toMatch(/sess-A/);
});
it('marks the pid as changeable so the human does not chase it', () => {
expect(decide({ acquireResult: blocked, sessionId: 's1' }).reason).toMatch(/may change|transient/i);
});
it('still surfaces the pid for triage', () => {
expect(decide({ acquireResult: blocked, sessionId: 's1' }).reason).toMatch(/12552/);
});
});
// Live wiring (point 2, 2026-05-31): PreToolUse acquires/refreshes the lock,
// Stop releases it. I/O is injected (readLock/writeLock/deleteLock) so the
// wiring stays pure and unit-testable; main() binds real fs.
import { runAcquireDecision, runReleaseAction } from './enforce-parallel-session-lock.mjs';
describe('runAcquireDecision — PreToolUse acquire/refresh wiring', () => {
it('allows and writes a fresh lock when none exists', () => {
let written = null;
const r = runAcquireDecision({
event: { tool_name: 'Edit', session_id: 'S1' },
now: 1000, pid: 42, cwd: '/ws',
readLock: () => null,
writeLock: (rec) => { written = rec; },
});
expect(r.block).toBe(false);
expect(written).toMatchObject({ session_id: 'S1', pid: 42, acquired_at: 1000 });
});
it('blocks when another session holds a fresh lock', () => {
const r = runAcquireDecision({
event: { tool_name: 'Edit', session_id: 'S2' },
now: 1000, pid: 7, cwd: '/ws',
readLock: () => ({ schema_version: 1, session_id: 'S1', pid: 99, acquired_at: 900, ttl_ms: 300000 }),
writeLock: () => {},
});
expect(r.block).toBe(true);
expect(r.reason).toMatch(/S1|pid 99|parallel session/i);
});
it('allows (refresh) when the same session already holds the lock', () => {
let written = null;
const r = runAcquireDecision({
event: { tool_name: 'Edit', session_id: 'S1' },
now: 2000, pid: 42, cwd: '/ws',
readLock: () => ({ schema_version: 1, session_id: 'S1', pid: 42, acquired_at: 900, ttl_ms: 300000 }),
writeLock: (rec) => { written = rec; },
});
expect(r.block).toBe(false);
expect(written.acquired_at).toBe(2000);
});
it('takes over a stale lock from another session (TTL expired)', () => {
let written = null;
const r = runAcquireDecision({
event: { tool_name: 'Edit', session_id: 'S2' },
now: 1_000_000, pid: 7, cwd: '/ws',
readLock: () => ({ schema_version: 1, session_id: 'S1', pid: 99, acquired_at: 0, ttl_ms: 300000 }),
writeLock: (rec) => { written = rec; },
});
expect(r.block).toBe(false);
expect(written.session_id).toBe('S2');
});
});
describe('runReleaseAction — Stop release wiring', () => {
it('deletes the lock when this session owns it', () => {
let deleted = false;
runReleaseAction({
event: { session_id: 'S1' },
cwd: '/ws',
readLock: () => ({ schema_version: 1, session_id: 'S1', pid: 42, acquired_at: 0, ttl_ms: 300000 }),
deleteLock: () => { deleted = true; },
});
expect(deleted).toBe(true);
});
it('does NOT delete a lock owned by another session', () => {
let deleted = false;
runReleaseAction({
event: { session_id: 'S2' },
cwd: '/ws',
readLock: () => ({ schema_version: 1, session_id: 'S1', pid: 42, acquired_at: 0, ttl_ms: 300000 }),
deleteLock: () => { deleted = true; },
});
expect(deleted).toBe(false);
});
it('is a no-op when no lock file exists', () => {
let deleted = false;
runReleaseAction({
event: { session_id: 'S1' },
cwd: '/ws',
readLock: () => null,
deleteLock: () => { deleted = true; },
});
expect(deleted).toBe(false);
});
});
// Cross-worktree false-block fix (2026-05-31). The lock must key on the session's
// stable work-tree root (from event.cwd → git toplevel), NOT the hook process.cwd()
// — which collapses to the main repo dir after a session resume, making sessions in
// DIFFERENT worktrees share one lock and block each other.
import { resolveWorkspacePath, pruneStaleLocks } from './enforce-parallel-session-lock.mjs';
describe('resolveWorkspacePath — stable worktree key', () => {
it('keys on event.cwd (the session worktree), not the hook process.cwd()', () => {
const r = resolveWorkspacePath({
event: { cwd: '/repo/.claude/worktrees/wt-A' },
processCwd: '/repo',
runGitToplevel: (dir) => dir,
});
expect(r).toBe('/repo/.claude/worktrees/wt-A');
});
it('gives different keys for two different worktrees (no cross-block)', () => {
const opts = { processCwd: '/repo', runGitToplevel: (dir) => dir };
const a = resolveWorkspacePath({ event: { cwd: '/repo/.claude/worktrees/wt-A' }, ...opts });
const b = resolveWorkspacePath({ event: { cwd: '/repo/.claude/worktrees/wt-B' }, ...opts });
expect(a).not.toBe(b);
});
it('resolves to the git work-tree root (collapses subdir variance)', () => {
const r = resolveWorkspacePath({
event: { cwd: '/repo/.claude/worktrees/wt-A/tools' },
processCwd: '/repo',
runGitToplevel: () => '/repo/.claude/worktrees/wt-A',
});
expect(r).toBe('/repo/.claude/worktrees/wt-A');
});
it('falls back to processCwd when event.cwd is absent', () => {
const r = resolveWorkspacePath({
event: { tool_name: 'Edit' },
processCwd: '/repo',
runGitToplevel: (dir) => dir,
});
expect(r).toBe('/repo');
});
it('falls back to the raw dir when git toplevel resolution fails (fail-open)', () => {
const r = resolveWorkspacePath({
event: { cwd: '/some/dir' },
processCwd: '/repo',
runGitToplevel: () => '',
});
expect(r).toBe('/some/dir');
});
});
// B (2026-05-31): disk hygiene. Leaked lock files (session closed without a clean
// Stop) pile up in ~/.claude/runtime. Pruning ONLY removes records that are
// already stale by the SAME isStale() definition acquire() uses — so it can never
// drop an active (within-TTL) lock and never weakens same-worktree serialization.
describe('pruneStaleLocks — drops only already-stale leaked locks (B)', () => {
const fresh = { schema_version: 1, session_id: 'A', pid: 1, acquired_at: 1000, ttl_ms: 300000 };
const stale = { schema_version: 1, session_id: 'B', pid: 2, acquired_at: 0, ttl_ms: 100 };
const isStaleFn = (rec, now) => !rec || (now - (rec && rec.acquired_at || 0)) > ((rec && rec.ttl_ms) || 300000);
it('deletes stale lock files and never the fresh (active) ones', () => {
const records = { '/r/lock-fresh.json': fresh, '/r/lock-stale.json': stale };
const deleted = [];
const r = pruneStaleLocks({
files: Object.keys(records),
readRecord: (f) => records[f],
deleteRecord: (f) => deleted.push(f),
isStaleFn, now: 1000,
});
expect(deleted).toEqual(['/r/lock-stale.json']);
expect(r.pruned).toBe(1);
});
it('treats an unreadable/garbage lock file as stale and prunes it', () => {
const deleted = [];
pruneStaleLocks({
files: ['/r/garbage.json'],
readRecord: () => { throw new Error('bad json'); },
deleteRecord: (f) => deleted.push(f),
isStaleFn, now: 1000,
});
expect(deleted).toEqual(['/r/garbage.json']);
});
it('never throws when a delete fails (best-effort hygiene)', () => {
expect(() => pruneStaleLocks({
files: ['/r/x.json'],
readRecord: () => stale,
deleteRecord: () => { throw new Error('locked'); },
isStaleFn, now: 1000,
})).not.toThrow();
});
it('does nothing for an empty file list', () => {
const r = pruneStaleLocks({ files: [], readRecord: () => null, deleteRecord: () => {}, isStaleFn, now: 1 });
expect(r.pruned).toBe(0);
});
});
// ── Calibration (2026-05-31): readonly Bash is outside the lock scope ──
// The lock serializes concurrent FILE MUTATION between sessions on the same
// worktree. A readonly Bash command (git status/log/diff, cat, grep, ls)
// mutates nothing, so a peer session's lock must NOT block it. This mirrors the
// LLM-judge readonly calibration (isReadonlyBashEvent in enforce-llm-judge-per-tool).
// Everything that can mutate — file edits, git commit/push, dangerous Bash, and
// every NON-Bash tool — still acquires/checks the lock, so mutation
// serialization is unchanged (scope fix, NOT a discipline drop).
describe('isReadonlyBashEvent — readonly Bash bypasses the lock (calibration 2026-05-31)', () => {
const ev = (command) => ({ tool_name: 'Bash', tool_input: { command } });
it('treats readonly git (status/log/diff) as readonly', () => {
expect(isReadonlyBashEvent(ev('git status'))).toBe(true);
expect(isReadonlyBashEvent(ev('git log --oneline -5'))).toBe(true);
expect(isReadonlyBashEvent(ev('git diff'))).toBe(true);
});
it('treats whitelisted reading commands (cat/grep/ls) as readonly', () => {
expect(isReadonlyBashEvent(ev('ls -la'))).toBe(true);
expect(isReadonlyBashEvent(ev('cat README.md'))).toBe(true);
expect(isReadonlyBashEvent(ev('grep -n foo bar.txt'))).toBe(true);
});
it('does NOT treat mutating Bash as readonly (still acquires/blocks)', () => {
expect(isReadonlyBashEvent(ev('rm -rf x'))).toBe(false);
expect(isReadonlyBashEvent(ev('git commit -m "x"'))).toBe(false);
expect(isReadonlyBashEvent(ev('npm install foo'))).toBe(false);
});
it('does NOT treat a chain with a mutating part as readonly (C13)', () => {
expect(isReadonlyBashEvent(ev('git status && rm x'))).toBe(false);
});
it('only applies to the Bash tool — other tools still acquire the lock', () => {
expect(isReadonlyBashEvent({ tool_name: 'Edit', tool_input: { file_path: 'a.js' } })).toBe(false);
expect(isReadonlyBashEvent({ tool_name: 'Write', tool_input: { file_path: 'a.js' } })).toBe(false);
});
it('is safe on malformed input', () => {
expect(isReadonlyBashEvent(null)).toBe(false);
expect(isReadonlyBashEvent({ tool_name: 'Bash', tool_input: {} })).toBe(false);
expect(isReadonlyBashEvent({ tool_name: 'Bash' })).toBe(false);
});
});
+28 -4
View File
@@ -21,13 +21,15 @@ import {
parseEventJson,
readRouterState,
readRationalizationFlags,
readTranscript,
sessionToolUses,
findOverride,
loadOverrideVocab,
} from './enforce-hook-helpers.mjs';
const SUPPRESS_RULE = 'classifier-mismatch';
export function buildReminder({ classification, recentFlags, override }) {
export function buildReminder({ classification, recentFlags, override, activeSkills = [] }) {
const lines = ['## §17 Coverage / Discipline Reminder', ''];
if (override) {
lines.push(`Override phrase detected: "${override.phrase}". The following rules are suppressed for THIS prompt only:`);
@@ -38,6 +40,16 @@ export function buildReminder({ classification, recentFlags, override }) {
lines.push(' `coverage: <channel>:<id>`');
lines.push('Channels: skill, node, chain, hook, agent, direct.');
lines.push('');
// Item G (2026-05-31): a skill invoked in an EARLIER turn stays active. Remind
// explicitly so the coverage line is not under-reported as direct/chain when the
// work actually continues under that skill. (The verifier now accepts a prior-turn
// skill, so this report is honest, not a violation.)
if (Array.isArray(activeSkills) && activeSkills.length > 0) {
lines.push('**Active skill(s) still in effect from earlier this session:**');
for (const s of activeSkills) lines.push(` - ${s}`);
lines.push('If your work continues under one of these, report `coverage: skill:<name>` (not direct/chain).');
lines.push('');
}
if (classification) {
lines.push(`**Classifier output:** task_type=${classification.task_type || 'unknown'}, confidence=${classification.confidence ?? 'n/a'}`);
if (classification.recommended_node) {
@@ -58,8 +70,6 @@ export function buildReminder({ classification, recentFlags, override }) {
lines.push('Adjust behaviour accordingly.');
lines.push('');
}
lines.push('Override vocabulary (substring-match in user prompt):');
lines.push(' без скилов / direct ok / срочно / быстрый коммит / recovery / memory dump / ремонт инфраструктуры');
return lines.join('\n');
}
@@ -96,7 +106,21 @@ async function main() {
const flags = readRationalizationFlags(sessionId);
const reminder = buildReminder({ classification, recentFlags: flags, override });
// Item G: detect skills invoked earlier this session (still active). The
// transcript at UserPromptSubmit holds all prior turns. Best-effort.
let activeSkills = [];
try {
const transcript = readTranscript(event.transcript_path);
const seen = new Set();
for (const u of sessionToolUses(transcript)) {
if (u.name === 'Skill' && u.input && u.input.skill && !seen.has(u.input.skill)) {
seen.add(u.input.skill);
activeSkills.push(u.input.skill);
}
}
} catch { activeSkills = []; }
const reminder = buildReminder({ classification, recentFlags: flags, override, activeSkills });
process.stdout.write(JSON.stringify({
hookSpecificOutput: {
+22 -4
View File
@@ -66,10 +66,28 @@ describe('enforce-prompt-injection / buildReminder', () => {
expect(txt).toMatch(/verify-before-push/);
});
it('lists override-vocabulary phrases for user reference', () => {
it('reminds about active skills carried over from prior turns (item G)', () => {
const txt = buildReminder({
classification: null,
recentFlags: [],
activeSkills: ['superpowers:test-driven-development'],
});
expect(txt).toMatch(/Active skill/i);
expect(txt).toMatch(/test-driven-development/);
expect(txt).toMatch(/coverage: skill:/);
});
it('omits the active-skill note when none are active', () => {
const txt = buildReminder({ classification: null, recentFlags: [], activeSkills: [] });
expect(txt).not.toMatch(/Active skill/i);
});
it('does NOT advertise dead override-vocabulary phrases (v4 stub — 1A 2026-05-31)', () => {
const txt = buildReminder({ classification: null, recentFlags: [] });
expect(txt).toMatch(/без скилов/);
expect(txt).toMatch(/direct ok/);
expect(txt).toMatch(/срочно/);
// findOverride/loadOverrideVocab — заглушки (vocab removed in v4); реклама фраз
// вводила в заблуждение (фразы не работают). Баннер убран.
expect(txt).not.toMatch(/Override vocabulary/);
expect(txt).not.toMatch(/без скилов/);
expect(txt).not.toMatch(/ремонт инфраструктуры/);
});
});
+8 -3
View File
@@ -16,16 +16,21 @@ import {
parseEventJson,
exitDecision,
} from './enforce-hook-helpers.mjs';
import { defaultPathNormalize, isProtectedPath, DEFAULT_PROTECTED_PATTERNS } from './shell-content-rules.mjs';
import { defaultPathNormalize, isProtectedPath, READ_DENY_PATTERNS } from './shell-content-rules.mjs';
export function decide({ toolName, filePath }) {
if (toolName !== 'Read') return { block: false, reason: null };
const fp = String(filePath || '');
if (!fp) return { block: false, reason: null };
if (isProtectedPath(fp, defaultPathNormalize, DEFAULT_PROTECTED_PATTERNS)) {
// Narrow READ_DENY_PATTERNS (not the full DEFAULT_PROTECTED_PATTERNS): Read of
// CLAUDE.md / normative docs / memory has no exfil value and must stay allowed
// for the claude-md-management / memory-sync workflow. Only genuine Read-exfil
// targets — transcripts, runtime, settings, secrets — are blocked. The full
// protected-list still guards Bash/PowerShell read and Write (over-block fix 2026-05-31).
if (isProtectedPath(fp, defaultPathNormalize, READ_DENY_PATTERNS)) {
return {
block: true,
reason: `path «${defaultPathNormalize(fp)}» protected against Read (§3.1 transcript/runtime/normative hard-deny)`,
reason: `path «${defaultPathNormalize(fp)}» protected against Read (§3.1 transcript/runtime/secrets hard-deny)`,
};
}
return { block: false, reason: null };
+40
View File
@@ -28,3 +28,43 @@ describe('enforce-read-path-deny decide()', () => {
expect(r.block).toBe(false);
});
});
// Over-block fix (2026-05-31): Smoke 5 added CLAUDE.md + memory/ + normative
// docs to the Read-deny set, which broke the legit claude-md-management /
// memory-sync workflow (Edit requires a prior Read). Read of CLAUDE.md / memory
// / Pravila has no exfil value (public-in-repo / own memory index). The genuine
// Read-exfil targets — cross-session transcripts (.jsonl) and ~/.claude/runtime
// — MUST stay blocked. Bash/PowerShell/Write protections (DEFAULT_PROTECTED_PATTERNS)
// are unchanged.
describe('enforce-read-path-deny — CLAUDE.md / memory readable (over-block fix 2026-05-31)', () => {
it('allows Read on CLAUDE.md (public-in-repo, no exfil value)', () => {
expect(decide({ toolName: 'Read', filePath: 'CLAUDE.md' }).block).toBe(false);
expect(decide({ toolName: 'Read', filePath: '/c/моя/проекты/портал crm/Документация/CLAUDE.md' }).block).toBe(false);
});
it('allows Read on MEMORY.md (own memory index under .claude/projects/<proj>/memory)', () => {
expect(decide({ toolName: 'Read', filePath: '/c/Users/Administrator/.claude/projects/crm/memory/MEMORY.md' }).block).toBe(false);
});
it('allows Read on a memory/*.md feedback file', () => {
expect(decide({ toolName: 'Read', filePath: '/c/Users/Administrator/.claude/projects/crm/memory/feedback_read_path_deny.md' }).block).toBe(false);
});
it('allows Read on a normative doc (Pravila) — needed for claude-md-management', () => {
expect(decide({ toolName: 'Read', filePath: 'docs/Pravila_raboty_Claude_v1_1.md' }).block).toBe(false);
});
it('STILL blocks Read on transcript JSONL under .claude/projects', () => {
expect(decide({ toolName: 'Read', filePath: '/c/Users/Administrator/.claude/projects/crm/session.jsonl' }).block).toBe(true);
expect(decide({ toolName: 'Read', filePath: '~/.claude/projects/abc-session.jsonl' }).block).toBe(true);
});
it('STILL blocks Read on ~/.claude/runtime artifacts', () => {
expect(decide({ toolName: 'Read', filePath: '~/.claude/runtime/router-state-x.json' }).block).toBe(true);
});
});
// Impl completion (2026-05-31, this session): exfil-pattern boundaries.
describe('enforce-read-path-deny — exfil-pattern boundaries (impl completion 2026-05-31)', () => {
it('STILL blocks Read on .env.production (secrets variant)', () => {
expect(decide({ toolName: 'Read', filePath: '.env.production' }).block).toBe(true);
});
it('allows Read on a Tooling normative doc (needed for normative sync)', () => {
expect(decide({ toolName: 'Read', filePath: 'docs/Tooling_v8_3.md' }).block).toBe(false);
});
});
+58 -2
View File
@@ -50,7 +50,7 @@ export const BASH_HARD_BLACKLIST = [
{ re: /(^|\s|;|&&|\|\|)chmod\b/, reason: 'chmod запрещён' },
{ re: /(^|\s|;|&&|\|\|)chown\b/, reason: 'chown запрещён' },
{ re: /(^|\s|;|&&|\|\|)chgrp\b/, reason: 'chgrp запрещён' },
{ re: /(?:^|[^0-9>&])>{1,2}(?![>&])/, reason: 'stdout redirect (>/>>) запрещён' },
// stdout redirect (>/>>) — quote-aware проверка в matchBashHardBlacklist (STDOUT_REDIRECT_RE), не здесь (quirk 2, 2026-05-31)
{ re: /\b(?:node|nodejs)\s+(?:[^|;]*\s)?(?:-e|--eval|-p|--print)\b/, reason: 'node -e/--eval/-p запрещён' },
{ re: /\bnode\s+(?:[^|;]*\s)?(?:-r|--require|--import|--experimental-loader)\b/, reason: 'node -r/--import запрещён' },
{ re: /\bpython3?\s+-c\b/, reason: 'python -c запрещён' },
@@ -72,11 +72,46 @@ export const BASH_HARD_BLACKLIST = [
{ re: /(^|\s|;|&&|\|\|)socat\b/, reason: 'G8: socat запрещён' },
];
// stdout redirect operator: `>`/`>>` не после цифры/>/& (исключает fd-dup 1>&2)
// и не перед >/& (так `>>` — один матч, `1>&2`/`2>&1` не ловятся).
const STDOUT_REDIRECT_RE = /(?:^|[^0-9>&])>{1,2}(?![>&])/;
/**
* Бланкует нутро одинарно/двойно-кавыченных участков (сохраняя сами кавычки,
* длину и всё вне кавычек). Обратный слэш экранирует следующий символ (значит
* экранированная кавычка НЕ открывает участок). Нужно для quote-aware детекции
* редиректа (quirk 2): `>` внутри кавыченного аргумента (текст коммита, <email>)
* не shell-редирект; настоящий оператор редиректа стоит ВНЕ кавычек и
* переживает бланковку.
*/
export function stripQuotedSpans(command) {
const s = String(command || '');
let out = '';
let quote = null;
let escaped = false;
for (const ch of s) {
if (escaped) { out += ch; escaped = false; continue; }
if (ch === '\\') { out += ch; escaped = true; continue; }
if (quote) {
if (ch === quote) { out += ch; quote = null; } else out += ' ';
continue;
}
if (ch === "'" || ch === '"') { out += ch; quote = ch; continue; }
out += ch;
}
return out;
}
export function matchBashHardBlacklist(command) {
const s = String(command || '');
if (hasInjection(s)) return '#34: echo/printf prompt-injection запрещён';
const stderr = stderrRedirectBlock(s);
// Quote-aware redirect detection (quirk 2): `>` / `2>` ВНУТРИ кавычек (текст
// коммита с <email> или "2>1") — не редирект. Сначала бланкуем кавыченное;
// настоящие операторы редиректа вне кавычек — переживают.
const stripped = stripQuotedSpans(s);
const stderr = stderrRedirectBlock(stripped);
if (stderr) return stderr;
if (STDOUT_REDIRECT_RE.test(stripped)) return 'stdout redirect (>/>>) запрещён';
return matchAny(BASH_HARD_BLACKLIST, s);
}
@@ -85,9 +120,30 @@ const READING_CMDS = new Set(['ls', 'pwd', 'wc', 'head', 'tail', 'file', 'stat',
const SAFE_EXACT = [
/^npx\s+vitest\s+(?:run|--version)\b/,
/^npm\s+(?:test|run\s+test|run\s+lint(?::[\w-]+)?)\b/,
// `npm ci` (2026-05-31, owner-authorized) — clean install from the committed
// lockfile (deterministic, no version drift) to restore junction node_modules
// in a fresh worktree. Distinct from `npm install`/`npm i`, which stay
// hard-blacklisted (line ~60) because they can pull new/updated versions.
// `\b` after `ci` prevents `npm cider`-style prefix matches.
/^npm\s+ci\b/,
/^php\s+artisan\s+(?:list|route:list|migrate:status)\b/,
/^composer\s+(?:show|outdated)\b/,
/^node\s+(?!.*(?:-e|--eval|-p|--print|-r|--require|--import|--experimental-loader)\b)/,
// Laravel dev workflow (2026-05-30) — exclude tinker (REPL = arbitrary PHP exec risk).
// Hard-blacklist (composer install/update/require/remove) remains the first check, unaffected.
// `migrate(?=\s|$)` lookahead prevents `migrate:install` / `migrate:<unknown>` from matching bare `migrate`.
/^php\s+artisan\s+(?:test|migrate:fresh|migrate:rollback|migrate:refresh|migrate:reset|migrate(?=\s|$)|db:seed|cache:clear|config:clear|view:clear|route:clear|optimize:clear)\b/,
/^composer\s+(?:test|pint|stan|insights|rector)\b/,
/^(?:\.\/)?vendor\/bin\/pest\b/,
/^pest\b/,
// Narrow `cd app` (2026-05-31, owner-authorized) — enter the Laravel project dir
// so already-whitelisted commands (pest, php artisan test) run from app/.
// Scope deliberately limited to the literal `app` dir: `cd` into any other path
// (incl. protected .claude/runtime, memory/, transcripts) stays default-deny, so
// the cwd-shift read-bypass is contained. Mutations remain caught at the
// hard-blacklist + chain-mutating rule (both run before the whitelist), and each
// chain segment after `cd app &&` must still be independently whitelisted.
/^cd\s+app$/,
];
export function classifyWhitelist(segments) {
+185
View File
@@ -161,3 +161,188 @@ describe('stderr redirect — 2>&1 fd-duplication (review fix)', () => {
expect(classifyBashCommand('cat a 2>&1 > out.txt', {}).result).toBe('block');
});
});
describe('SAFE_EXACT — Laravel dev workflow (whitelist expansion 2026-05-30)', () => {
// Allowed: PHP/Laravel dev commands that were missing from whitelist
it.each([
'php artisan test',
'php artisan test --filter=Auth',
'php artisan migrate',
'php artisan migrate:fresh',
'php artisan migrate:rollback',
'php artisan migrate:refresh',
'php artisan migrate:reset',
'php artisan db:seed',
'php artisan cache:clear',
'php artisan config:clear',
'php artisan view:clear',
'php artisan route:clear',
'php artisan optimize:clear',
'composer test',
'composer pint',
'composer stan',
'composer insights',
'composer rector',
'pest',
'pest --filter=Foo',
'vendor/bin/pest',
'./vendor/bin/pest',
])('allows %s', (cmd) => {
expect(classifyBashCommand(cmd, {}).result).toBe('allow');
});
// Critical: REPL and composer mutations remain hard-blocked
it.each([
['php artisan tinker', 'REPL = arbitrary PHP exec risk'],
['php artisan tinker --execute="exit"', 'tinker variant'],
['composer install', 'hard-blacklist'],
['composer require foo/bar', 'hard-blacklist'],
['composer update', 'hard-blacklist'],
['composer remove foo/bar', 'hard-blacklist'],
['php artisan migrate:install', 'unknown migrate subcommand outside whitelist set'],
])('still blocks %s (%s)', (cmd) => {
expect(classifyBashCommand(cmd, {}).result).toBe('block');
});
// Critical: existing pre-existing v3.8 keep behaviour
it('keeps php artisan list/route:list/migrate:status allowed (pre-existing v3.8)', () => {
expect(classifyBashCommand('php artisan list', {}).result).toBe('allow');
expect(classifyBashCommand('php artisan route:list', {}).result).toBe('allow');
expect(classifyBashCommand('php artisan migrate:status', {}).result).toBe('allow');
});
// Critical: pest does NOT match pestilence-like prefixes (word boundary)
it('does not allow command names sharing prefix with pest', () => {
expect(classifyBashCommand('pestilence', {}).result).toBe('block');
});
// Critical: chain semantics still enforced — pest && rm x → block (rm is mutating)
it('still blocks chain with mutating part even if first part is whitelisted pest', () => {
expect(classifyBashCommand('pest && rm x', {}).result).toBe('block');
});
// Critical: composer-show/outdated still allowed (pre-existing v3.8)
it('keeps composer show/outdated allowed (pre-existing v3.8)', () => {
expect(classifyBashCommand('composer show', {}).result).toBe('allow');
expect(classifyBashCommand('composer outdated', {}).result).toBe('allow');
});
});
describe('SAFE_EXACT — narrow `cd app` whitelist (2026-05-31, owner-authorized)', () => {
// Allowed: enter the Laravel project dir, alone or chained with whitelisted cmds
it.each([
'cd app',
'cd app && pest',
'cd app && php artisan test',
'cd app && composer test',
])('allows %s', (cmd) => {
expect(classifyBashCommand(cmd, {}).result).toBe('allow');
});
// Scope: cd into any other dir stays default-deny (cwd-shift read-bypass contained)
it.each([
'cd ~/.claude/runtime',
'cd ../memory',
'cd app/storage',
'cd /tmp',
'cd ..',
])('still blocks cd into non-app dir: %s', (cmd) => {
expect(classifyBashCommand(cmd, {}).result).toBe('block');
});
// cwd-shift read-exfil attempt via narrow cd app stays blocked (protected path by name)
it('still blocks reading a protected file from app/ via literal path', () => {
expect(classifyBashCommand('cd app && cat ../.env', {}).result).toBe('block');
expect(classifyBashCommand('cd app && cat ~/.claude/runtime/state.json', {}).result).toBe('block');
});
// Mutations after cd app remain caught (hard-blacklist + chain-mutating rule)
it.each([
'cd app && rm foo',
'cd app && mkdir x',
'cd app && git commit -m x',
])('still blocks mutating chain: %s', (cmd) => {
expect(classifyBashCommand(cmd, {}).result).toBe('block');
});
// Second segment must still be independently whitelisted
it('still blocks cd app chained with a non-whitelisted command', () => {
expect(classifyBashCommand('cd app && frobnicate', {}).result).toBe('block');
});
});
describe('SAFE_EXACT — npm ci (worktree dep restore, 2026-05-31)', () => {
// Allowed: npm ci installs exactly the committed lockfile (deterministic, no
// version drift) — needed to restore junction node_modules in a fresh worktree.
it.each([
'npm ci',
'npm ci --no-audit',
'npm ci --prefer-offline',
])('allows %s', (cmd) => {
expect(classifyBashCommand(cmd, {}).result).toBe('allow');
});
// Critical: npm install / npm i remain hard-blacklisted (line 60) — they can
// pull new/updated versions, unlike ci which pins to the lockfile.
it.each([
'npm install',
'npm i',
'npm install foo',
'npm i foo',
])('still blocks %s (hard-blacklist)', (cmd) => {
expect(classifyBashCommand(cmd, {}).result).toBe('block');
});
// Critical: word boundary — `npm cider` (or any ci-prefixed token) is NOT npm ci
it('does not allow ci-prefixed token (word boundary)', () => {
expect(classifyBashCommand('npm cider', {}).result).toBe('block');
});
// Critical: chain semantics still enforced — npm ci && rm x → block (rm mutating)
it('still blocks chain with mutating part after npm ci', () => {
expect(classifyBashCommand('npm ci && rm x', {}).result).toBe('block');
});
});
import { stripQuotedSpans } from './enforce-router-gate.mjs';
describe('quote-aware redirect (quirk 2)', () => {
// False positives that must now be ALLOWED — `>` / `2>` живут внутри кавычек.
it('allows > inside double-quoted commit message (co-author <email>)', () => {
expect(matchBashHardBlacklist('git commit -m "x <noreply@anthropic.com>"')).toBe(null);
});
it('allows 2> inside double-quoted message', () => {
expect(matchBashHardBlacklist('git commit -m "fix 2>1 logging"')).toBe(null);
});
it('allows lone quoted >', () => {
expect(matchBashHardBlacklist('git commit -m ">"')).toBe(null);
});
// Real redirects (operator OUTSIDE quotes) must STILL BLOCK.
it('blocks spaced stdout redirect', () => {
expect(matchBashHardBlacklist('echo x > /tmp/f')).toBeTruthy();
});
it('blocks no-space stdout redirect', () => {
expect(matchBashHardBlacklist('echo x>/tmp/f')).toBeTruthy();
});
it('blocks append redirect', () => {
expect(matchBashHardBlacklist('echo x >> /tmp/f')).toBeTruthy();
});
it('blocks stderr redirect to file', () => {
expect(matchBashHardBlacklist('cmd 2> /tmp/err')).toBeTruthy();
});
it('blocks redirect after a closing quote', () => {
expect(matchBashHardBlacklist('echo "x" > /tmp/f')).toBeTruthy();
});
});
describe('stripQuotedSpans (quirk 2 helper)', () => {
it('blanks double-quoted interior, keeps outside', () => {
expect(stripQuotedSpans('a "b>c" > d')).toBe('a " " > d');
});
it('blanks single-quoted interior', () => {
expect(stripQuotedSpans("a 'x>y' z")).toBe("a ' ' z");
});
it('keeps backslash-escaped quote literal (no span opened)', () => {
expect(stripQuotedSpans('a \\" > b')).toBe('a \\" > b');
});
});
+77
View File
@@ -0,0 +1,77 @@
#!/usr/bin/env node
/**
* enforce-runtime-write-deny PreToolUse(Edit|Write|MultiEdit|NotebookEdit).
*
* Blocks the Write/Edit TOOL from writing under ~/.claude/runtime/** closing a
* pre-existing self-write hole on the v4 git-approval anchor
* (askuser-decisions-<sess>.jsonl) and any future runtime side-channel. The
* Bash/PowerShell gates already protect this path for shell commands; the Write
* tool was the remaining ungated channel. Standalone independent of
* safe-baseline (router-gate v4 §"Side-channel write-protection", item 1b / C3).
*
* Uses the resolving `pathNormalize` (path.resolve + realpath) so `.`/`..`
* segments cannot evade the match (V2-2) `~/.claude/./runtime/x` collapses to
* the real path before the regex test. Legitimate hooks write there via Node
* `fs`, not the Claude Write tool, so they are unaffected.
*
* Posture: blocks ONLY on a confirmed runtime match; if normalization throws
* (cannot determine the path) it fails-OPEN, never bricking the session. This
* deliberately trades a tiny residual (a malformed path that throws) for
* not self-locking the controller out of all edits on an unexpected error.
*/
import { pathNormalize } from './path-normalization.mjs';
import { readStdin, parseEventJson, exitDecision } from './enforce-hook-helpers.mjs';
const WRITE_TOOLS = new Set(['Edit', 'Write', 'MultiEdit', 'NotebookEdit']);
const RUNTIME_RE = /(^|\/)\.claude\/runtime(\/|$)/i;
// Transcript protection (Z Part 1): any *.jsonl under ~/.claude/projects/** is a
// session/subagent transcript. The tdd-gate credits a subagent's RED from its
// agent-<id>.jsonl, so these must be unforgeable by the Write tool. Memory files
// there are *.md and never match `.jsonl$`, so memory writes stay allowed.
const TRANSCRIPT_RE = /(^|\/)\.claude\/projects\/.*\.jsonl$/i;
/**
* Pure decision.
* @param {object} p
* @param {string} p.toolName
* @param {string} p.filePath
* @param {Function} [p.normalizeImpl] - injectable normalizer (default: resolving pathNormalize)
* @returns {{block:boolean, reason?:string}}
*/
export function decide({ toolName, filePath, normalizeImpl = pathNormalize }) {
if (!WRITE_TOOLS.has(toolName)) return { block: false };
const fp = String(filePath || '');
if (!fp) return { block: false };
let norm;
try { norm = normalizeImpl(fp); } catch { return { block: false }; } // cannot determine → fail-open
const normStr = String(norm || '');
if (RUNTIME_RE.test(normStr)) {
return {
block: true,
reason: `Write to «${norm}» denied — ~/.claude/runtime is a protected side-channel (git-approval anchor). Hooks write it via Node fs, not the Write tool.`,
};
}
if (TRANSCRIPT_RE.test(normStr)) {
return {
block: true,
reason: `Write to «${norm}» denied — ~/.claude/projects/**/*.jsonl are session/subagent transcripts (tamper-protected; the tdd-gate trusts them). The harness writes transcripts, never the Write tool. Memory *.md there stays writable.`,
};
}
return { block: false };
}
async function main() {
try {
const event = parseEventJson(await readStdin());
const r = decide({
toolName: event.tool_name,
filePath: (event.tool_input && (event.tool_input.file_path || event.tool_input.notebook_path)) || '',
});
exitDecision({ block: r.block, message: r.reason });
} catch {
exitDecision({ block: false }); // fail-quiet
}
}
const isCli = process.argv[1] && process.argv[1].replace(/\\/g, '/').endsWith('/enforce-runtime-write-deny.mjs');
if (isCli) main();
+98
View File
@@ -0,0 +1,98 @@
// tools/enforce-runtime-write-deny.test.mjs
// Standalone write-deny on ~/.claude/runtime (router-gate v4 §"Side-channel
// write-protection", item 1b / C3). Closes a pre-existing self-write hole on the
// git-approval anchor; uses the resolving pathNormalize so `.`/`..` segments
// cannot evade the match (V2-2).
import { describe, it, expect } from 'vitest';
import { decide } from './enforce-runtime-write-deny.mjs';
import { homedir } from 'node:os';
import { join } from 'node:path';
const HOME = homedir();
const HOME_FWD = HOME.replace(/\\/g, '/');
describe('enforce-runtime-write-deny decide()', () => {
it('blocks a Write into ~/.claude/runtime (git-approval anchor)', () => {
const r = decide({ toolName: 'Write', filePath: join(HOME, '.claude', 'runtime', 'askuser-decisions-S.jsonl') });
expect(r.block).toBe(true);
});
it('blocks the .-segment evasion (V2-2)', () => {
// Raw string with `/./` — path.join would pre-collapse it, so build it literally.
const evasion = `${HOME_FWD}/.claude/./runtime/x.jsonl`;
const r = decide({ toolName: 'Write', filePath: evasion });
expect(r.block).toBe(true);
});
it('blocks Edit/MultiEdit/NotebookEdit too', () => {
const p = join(HOME, '.claude', 'runtime', 'safe-baseline-ledger-S.json');
expect(decide({ toolName: 'Edit', filePath: p }).block).toBe(true);
expect(decide({ toolName: 'MultiEdit', filePath: p }).block).toBe(true);
expect(decide({ toolName: 'NotebookEdit', filePath: p }).block).toBe(true);
});
it('allows a Write to a normal project path', () => {
const r = decide({ toolName: 'Write', filePath: join(HOME, 'project', 'src', 'x.mjs') });
expect(r.block).toBe(false);
});
it('ignores non-write tools', () => {
expect(decide({ toolName: 'Read', filePath: join(HOME, '.claude', 'runtime', 'x') }).block).toBe(false);
expect(decide({ toolName: 'Bash', filePath: join(HOME, '.claude', 'runtime', 'x') }).block).toBe(false);
});
it('fail-open (no block) when the normalizer throws — never bricks the session', () => {
const throwing = () => { throw new Error('boom'); };
const r = decide({ toolName: 'Write', filePath: join(HOME, '.claude', 'runtime', 'x'), normalizeImpl: throwing });
expect(r.block).toBe(false);
});
it('blocks via injected normalizer that resolves into runtime', () => {
const r = decide({ toolName: 'Write', filePath: 'whatever', normalizeImpl: () => '/home/u/.claude/runtime/x.jsonl' });
expect(r.block).toBe(true);
});
});
// Part 1 of Z (2026-05-31): close the transcript Write hole. The tdd-gate will
// (Part 2) credit a subagent's RED from its agent-<id>.jsonl; that transcript
// must therefore be unforgeable. The Write tool was the last ungated channel
// into ~/.claude/projects/**/*.jsonl (Bash/PowerShell/Read gates already cover
// it). Memory files there are .md and stay writable (they never match .jsonl$).
describe('enforce-runtime-write-deny — transcript .jsonl protection (Z Part 1)', () => {
it('blocks a Write to a subagent transcript under ~/.claude/projects', () => {
const p = join(HOME, '.claude', 'projects', 'slug', 'sess-uuid', 'subagents', 'agent-abc.jsonl');
expect(decide({ toolName: 'Write', filePath: p }).block).toBe(true);
});
it('blocks a Write to the controller session transcript itself', () => {
const p = join(HOME, '.claude', 'projects', 'slug', 'sess-uuid.jsonl');
expect(decide({ toolName: 'Write', filePath: p }).block).toBe(true);
});
it('blocks Edit/MultiEdit/NotebookEdit on a transcript .jsonl too', () => {
const p = join(HOME, '.claude', 'projects', 'slug', 'sess', 'subagents', 'agent-x.jsonl');
expect(decide({ toolName: 'Edit', filePath: p }).block).toBe(true);
expect(decide({ toolName: 'MultiEdit', filePath: p }).block).toBe(true);
expect(decide({ toolName: 'NotebookEdit', filePath: p }).block).toBe(true);
});
it('blocks the .-segment evasion into projects transcripts', () => {
const evasion = `${HOME_FWD}/.claude/projects/slug/./sess/subagents/agent-x.jsonl`;
expect(decide({ toolName: 'Write', filePath: evasion }).block).toBe(true);
});
it('ALLOWS a memory .md under ~/.claude/projects (never a .jsonl)', () => {
const p = join(HOME, '.claude', 'projects', 'slug', 'memory', 'feedback_x.md');
expect(decide({ toolName: 'Write', filePath: p }).block).toBe(false);
});
it('ALLOWS a .jsonl OUTSIDE ~/.claude/projects (e.g. repo observer episodes)', () => {
const p = join(HOME, 'repo', 'docs', 'observer', 'episodes-2026-05.jsonl');
expect(decide({ toolName: 'Write', filePath: p }).block).toBe(false);
});
it('ignores non-write tools on a transcript path', () => {
const p = join(HOME, '.claude', 'projects', 'slug', 'sess', 'subagents', 'agent-x.jsonl');
expect(decide({ toolName: 'Read', filePath: p }).block).toBe(false);
});
});
+225
View File
@@ -0,0 +1,225 @@
#!/usr/bin/env node
/**
* enforce-safe-baseline-metering PreToolUse wrapper around the pure
* safe-baseline-metering module (router-gate v4 §3.1.2 Direction 1).
*
* Catches skill-substitution laundering: many Read/Grep/Glob/LS/TodoWrite/
* AskUserQuestion calls used as an analysis channel INSTEAD of invoking the
* recommended Skill, then a mutating tool (Edit/Write/Bash/) lands without any
* skill ever matching. Safe-baseline tools themselves stay allowed (legit
* continuation reading); only a mutating tool past the hard threshold is blocked.
*
* Stream H tail adds the wrapper. Pure metering + threshold logic live in
* safe-baseline-metering.mjs; this file is just the hook entry composition.
*
* Convention (mirrors enforce-decomposition-detector.mjs): the testable unit is
* the pure `decide()` composition. The live `main()` task-boundary inference,
* skill-match detection from the transcript, and per-task counter persistence
* is a deferred no-op (exit 0) until that wiring is designed in the spec/plan.
* Until then the hook NEVER blocks (no self-lockout, same posture as the sibling
* Stream H wrappers). Settings.json registration is also deferred.
*/
import {
incrementCounter,
evaluateThresholds,
DEFAULT_THRESHOLDS,
newCounterState,
shouldInheritTaskId,
deriveTaskId,
} from './safe-baseline-metering.mjs';
import { readFileSync, writeFileSync, appendFileSync, mkdirSync } from 'node:fs';
import { join } from 'node:path';
import { homedir } from 'node:os';
import {
readStdin,
parseEventJson,
readTranscript,
lastUserPromptText,
lastTurnEntries,
exitDecision,
} from './enforce-hook-helpers.mjs';
/**
* Pure decision: increment the per-task counter for `toolName`, then evaluate
* thresholds against the resulting state.
*
* @param {object} args
* @param {object} args.state - current per-task counter state (newCounterState shape)
* @param {string} args.toolName - the tool about to run
* @param {boolean} [args.skillMatched] - whether a recommended Skill matched in this task
* @param {object} [args.thresholds] - override DEFAULT_THRESHOLDS
* @returns {{state:object, action:'allow'|'soft_flag'|'hard_block', reason?:string}}
*/
export function decide({ state, toolName, skillMatched = false, thresholds = DEFAULT_THRESHOLDS }) {
const next = incrementCounter(state, toolName);
const evalResult = evaluateThresholds(next, toolName, skillMatched, thresholds);
return { state: next, action: evalResult.action, reason: evalResult.reason };
}
/**
* Task-boundary head: decide whether the current event continues the prior task
* or starts a fresh one, then meter it.
*
* Continuation rules (delegated to the pure module):
* - no prior ledger fresh task
* - reset marker in promptText fresh task (shouldInheritTaskId=false)
* - keyword overlap with prior task < 2 fresh task
* - otherwise inherit prior counters
*
* @param {object} args
* @param {object} args.event - PreToolUse event ({ tool_name })
* @param {object|null} args.priorLedger - { state, lastKeywords } from the last event, or null
* @param {string[]} args.currentKeywords - keywords distilled from the current prompt
* @param {string} args.promptText - the current user prompt (for reset-marker detection)
* @param {boolean} [args.skillMatched] - whether a recommended Skill matched in this task
* @param {object} [args.thresholds] - override DEFAULT_THRESHOLDS
* @returns {{action:string, reason?:string, ledger:{state:object, lastKeywords:string[]}}}
*/
export function processEvent({
event,
priorLedger,
currentKeywords = [],
promptText = '',
skillMatched = false,
thresholds = DEFAULT_THRESHOLDS,
}) {
const toolName = event && event.tool_name;
const inherit =
priorLedger &&
priorLedger.state &&
shouldInheritTaskId(priorLedger.lastKeywords || [], currentKeywords, promptText);
const baseState = inherit
? priorLedger.state
: newCounterState({
taskId: deriveTaskId(promptText),
startedAtIso: '',
firstPromptExcerpt: promptText,
});
const d = decide({ state: baseState, toolName, skillMatched, thresholds });
return {
action: d.action,
reason: d.reason,
ledger: { state: d.state, lastKeywords: currentKeywords },
};
}
// ── 1b live-wiring: pure helpers (safe-baseline-live-wiring-design.md v4) ──
// Common RU imperatives + RU/EN stopwords that would otherwise create spurious
// keyword overlap between unrelated tasks (G2). Length<4 tokens are dropped
// separately; this set targets >=4-char common words.
const STOPWORDS = new Set([
'сделай', 'сделать', 'проверь', 'проверить', 'посмотри', 'добавь', 'добавить',
'напиши', 'написать', 'нужно', 'надо', 'давай', 'можешь', 'потом', 'после',
'перед', 'через', 'очень', 'если', 'чтобы', 'этот', 'эта', 'это', 'эти',
'или', 'тоже', 'также', 'когда', 'пока', 'весь', 'всё', 'все', 'теперь',
'здесь', 'там', 'нет', 'есть', 'будет', 'было', 'твой', 'мой', 'самый',
'then', 'this', 'that', 'with', 'from', 'your', 'please', 'just', 'make',
'check', 'look', 'need', 'want', 'also', 'into', 'more', 'very', 'should',
'will', 'have', 'does', 'done', 'them', 'they', 'here', 'there',
]);
/** Deterministic keyword extraction (H1): lowercase, drop <4-char + stopwords, unique, sorted. */
export function extractKeywords(promptText) {
if (typeof promptText !== 'string') return [];
const tokens = promptText
.toLowerCase()
.split(/[^\p{L}\p{N}]+/u)
.filter((t) => t.length >= 4 && !STOPWORDS.has(t));
return [...new Set(tokens)].sort();
}
const SKILL_MATCH_TOOLS = new Set(['Skill', 'EnterPlanMode']);
/** C2/V2-5: true iff the turn has a real assistant tool_use of Skill or EnterPlanMode. */
export function detectSkillMatch(turnEntries) {
if (!Array.isArray(turnEntries)) return false;
for (const e of turnEntries) {
const c = e && e.message && e.message.content;
if (!Array.isArray(c)) continue;
for (const b of c) {
if (b && b.type === 'tool_use' && SKILL_MATCH_TOOLS.has(b.name)) return true;
}
}
return false;
}
/**
* V2-1 stickiness contract: the pure pipeline neither persists nor task-scopes
* skill-match, so this wrapper owns it. Compute inherit (same predicate as
* processEvent), scope the prior sticky flag to inherit, OR in this turn's match,
* run the decision, then write the effective flag back into the persisted state.
*/
export function runLiveDecision({ event, priorLedger, promptText, currentKeywords, skillMatchedThisTurn, thresholds }) {
const inherit = !!(priorLedger && priorLedger.state &&
shouldInheritTaskId(priorLedger.lastKeywords || [], currentKeywords, promptText));
const priorSticky = inherit ? !!priorLedger.state.skill_match_within_task : false;
const effectiveSkillMatched = priorSticky || !!skillMatchedThisTurn;
const res = processEvent({
event, priorLedger, currentKeywords, promptText,
skillMatched: effectiveSkillMatched, thresholds,
});
res.ledger.state.skill_match_within_task = effectiveSkillMatched;
return res;
}
// ── live I/O composition ──
const ESCAPE_MSG = 'invoke the recommended Skill, or EnterPlanMode, to proceed (skill/plan invocations are never blocked by this layer).';
function ledgerDir(override) {
return override || join(homedir(), '.claude', 'runtime');
}
function loadLedger(dir, sess) {
try { return JSON.parse(readFileSync(join(dir, `safe-baseline-ledger-${sess || 'unknown'}.json`), 'utf8')); }
catch { return null; }
}
function saveLedger(dir, sess, ledger) {
try {
mkdirSync(dir, { recursive: true });
writeFileSync(join(dir, `safe-baseline-ledger-${sess || 'unknown'}.json`), JSON.stringify(ledger));
} catch { /* fail-quiet */ }
}
function logFlag(dir, sess, entry) {
try {
mkdirSync(dir, { recursive: true });
appendFileSync(join(dir, `safe-baseline-flags-${sess || 'unknown'}.jsonl`),
JSON.stringify({ ts: new Date().toISOString(), ...entry }) + '\n');
} catch { /* ignore */ }
}
/** Testable live head: returns {block, message?} and persists the ledger. Fail-quiet. */
export async function runMain({ event, runtimeDir, transcript: injectedTranscript } = {}) {
try {
const sess = event.session_id;
const dir = ledgerDir(runtimeDir);
const transcript = injectedTranscript || readTranscript(event.transcript_path);
const promptText = lastUserPromptText(transcript) || '';
const currentKeywords = extractKeywords(promptText);
const skillMatchedThisTurn = detectSkillMatch(lastTurnEntries(transcript)) ||
SKILL_MATCH_TOOLS.has(event.tool_name);
const priorLedger = loadLedger(dir, sess);
const res = runLiveDecision({ event, priorLedger, promptText, currentKeywords, skillMatchedThisTurn });
saveLedger(dir, sess, res.ledger);
if (res.action === 'soft_flag') logFlag(dir, sess, { tool: event.tool_name, reason: res.reason });
if (res.action === 'hard_block') return { block: true, message: `[safe-baseline] ${res.reason}\n${ESCAPE_MSG}` };
return { block: false };
} catch {
return { block: false }; // fail-quiet — never crash the session
}
}
async function main() {
const event = parseEventJson(await readStdin());
const res = await runMain({ event });
exitDecision(res);
}
if ((process.argv[1] || '').replace(/\\/g, '/').endsWith('/enforce-safe-baseline-metering.mjs')) {
main().catch(() => process.exit(0));
}
@@ -0,0 +1,283 @@
// tools/enforce-safe-baseline-metering.test.mjs
// Stream H tail — wrapper tests around the pure safe-baseline-metering module
// (router-gate v4 §3.1.2 Direction 1). Mirrors the enforce-decomposition-detector
// convention: implement + test a pure `decide()` composition; live main() wiring
// (transcript task-boundary + skill detection + state persistence) is now live
// (1b — safe-baseline-live-wiring-design.md v4).
import { describe, it, expect } from 'vitest';
import { decide, processEvent, extractKeywords, detectSkillMatch, runLiveDecision, runMain } from './enforce-safe-baseline-metering.mjs';
import { newCounterState } from './safe-baseline-metering.mjs';
import { mkdtempSync, writeFileSync, existsSync } from 'node:fs';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
function freshState() {
return newCounterState({ taskId: 't', startedAtIso: '2026-05-29T00:00:00Z', firstPromptExcerpt: 'p' });
}
function withCounts(patch) {
const s = freshState();
return { ...s, counts: { ...s.counts, ...patch } };
}
describe('enforce-safe-baseline-metering decide()', () => {
it('allows a metered Read below warn threshold and increments its counter', () => {
const r = decide({ state: freshState(), toolName: 'Read', skillMatched: false });
expect(r.action).toBe('allow');
expect(r.state.counts.Read).toBe(1);
});
it('soft_flags a metered Read once it reaches the warn threshold (29→30)', () => {
const r = decide({ state: withCounts({ Read: 29 }), toolName: 'Read', skillMatched: false });
expect(r.action).toBe('soft_flag');
expect(r.state.counts.Read).toBe(30);
});
it('hard_blocks a mutating tool when a metered counter is at its hard limit, no skill', () => {
const r = decide({ state: withCounts({ Read: 60 }), toolName: 'Edit', skillMatched: false });
expect(r.action).toBe('hard_block');
expect(r.reason).toContain('Read=60');
});
it('allows the mutating tool when a skill was matched, even past the hard limit', () => {
const r = decide({ state: withCounts({ Read: 60 }), toolName: 'Edit', skillMatched: true });
expect(r.action).toBe('allow');
});
it('allows (and does not count) a tool that is neither metered nor mutating', () => {
const r = decide({ state: freshState(), toolName: 'WebFetch', skillMatched: false });
expect(r.action).toBe('allow');
expect(r.state.counts.Read).toBe(0);
});
it('does not mutate the caller-provided state object (immutability)', () => {
const s = freshState();
decide({ state: s, toolName: 'Read', skillMatched: false });
expect(s.counts.Read).toBe(0);
});
it('maps TodoWrite to TodoWrite_writes and soft_flags at its warn threshold (4→5)', () => {
const r = decide({ state: withCounts({ TodoWrite_writes: 4 }), toolName: 'TodoWrite', skillMatched: false });
expect(r.state.counts.TodoWrite_writes).toBe(5);
expect(r.action).toBe('soft_flag');
});
it('keeps a metered Grep allowed once past its hard threshold (continuation reading)', () => {
const r = decide({ state: withCounts({ Grep: 30 }), toolName: 'Grep', skillMatched: false });
expect(r.action).toBe('allow');
expect(r.state.counts.Grep).toBe(31);
});
it('hard_blocks a mutating Bash when TodoWrite_writes is at its hard limit', () => {
const r = decide({ state: withCounts({ TodoWrite_writes: 15 }), toolName: 'Bash', skillMatched: false });
expect(r.action).toBe('hard_block');
expect(r.reason).toContain('TodoWrite_writes=15');
});
});
describe('enforce-safe-baseline-metering processEvent() — task-boundary head', () => {
it('starts a fresh task when there is no prior ledger', () => {
const r = processEvent({
event: { tool_name: 'Read' },
priorLedger: null,
currentKeywords: ['router', 'gate', 'safe'],
promptText: 'почини safe-baseline',
skillMatched: false,
});
expect(r.action).toBe('allow');
expect(r.ledger.state.counts.Read).toBe(1);
expect(r.ledger.lastKeywords).toEqual(['router', 'gate', 'safe']);
});
it('continues the prior task when keywords overlap >=2 and no reset marker', () => {
const prior = {
state: { ...newCounterState({ taskId: 't', startedAtIso: '2026-05-29T00:00:00Z', firstPromptExcerpt: 'p' }), counts: { Read: 29, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 } },
lastKeywords: ['router', 'gate', 'safe'],
};
const r = processEvent({
event: { tool_name: 'Read' },
priorLedger: prior,
currentKeywords: ['router', 'gate', 'extra'],
promptText: 'дальше по safe-baseline',
skillMatched: false,
});
expect(r.ledger.state.counts.Read).toBe(30);
expect(r.action).toBe('soft_flag');
});
it('resets to a fresh task on a reset marker even if keywords overlap', () => {
const prior = {
state: { ...newCounterState({ taskId: 't', startedAtIso: '2026-05-29T00:00:00Z', firstPromptExcerpt: 'p' }), counts: { Read: 29, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 } },
lastKeywords: ['router', 'gate', 'safe'],
};
const r = processEvent({
event: { tool_name: 'Read' },
priorLedger: prior,
currentKeywords: ['router', 'gate', 'safe'],
promptText: 'новая задача — посмотри другое',
skillMatched: false,
});
expect(r.ledger.state.counts.Read).toBe(1);
});
it('starts a fresh task when keyword overlap is below 2', () => {
const prior = {
state: { ...newCounterState({ taskId: 't', startedAtIso: '2026-05-29T00:00:00Z', firstPromptExcerpt: 'p' }), counts: { Read: 29, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 } },
lastKeywords: ['router', 'gate', 'safe'],
};
const r = processEvent({
event: { tool_name: 'Read' },
priorLedger: prior,
currentKeywords: ['totally', 'different', 'topic'],
promptText: 'другая тема',
skillMatched: false,
});
expect(r.ledger.state.counts.Read).toBe(1);
});
it('allows a mutating tool past the hard limit when a skill matched', () => {
const prior = {
state: { ...newCounterState({ taskId: 't', startedAtIso: '2026-05-29T00:00:00Z', firstPromptExcerpt: 'p' }), counts: { Read: 60, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 } },
lastKeywords: ['router', 'gate', 'safe'],
};
const r = processEvent({
event: { tool_name: 'Edit' },
priorLedger: prior,
currentKeywords: ['router', 'gate', 'safe'],
promptText: 'продолжаем',
skillMatched: true,
});
expect(r.action).toBe('allow');
});
});
// ── 1b live-wiring: new pure helpers ──
describe('extractKeywords (H1)', () => {
it('lowercases, drops <4-char tokens, returns unique sorted', () => {
expect(extractKeywords('Router GATE safe baseline router')).toEqual(['baseline', 'gate', 'router', 'safe']);
});
it('drops common RU imperatives so unrelated tasks do not falsely overlap', () => {
const a = extractKeywords('сделай проверь биллинг тариф');
const b = extractKeywords('сделай проверь регион маршрут');
const overlap = a.filter((k) => b.includes(k));
expect(overlap).toEqual([]);
});
it('returns [] for empty/non-string', () => {
expect(extractKeywords('')).toEqual([]);
expect(extractKeywords(null)).toEqual([]);
});
});
function asstToolUse(name, input = {}) {
return { message: { role: 'assistant', content: [{ type: 'tool_use', name, input }] } };
}
describe('detectSkillMatch (C2/V2-5)', () => {
it('true when the turn has a Skill tool_use', () => {
expect(detectSkillMatch([asstToolUse('Skill', { skill: 'superpowers:brainstorming' })])).toBe(true);
});
it('true when the turn has an EnterPlanMode tool_use', () => {
expect(detectSkillMatch([asstToolUse('EnterPlanMode')])).toBe(true);
});
it('false for Read tool_use or plain text mention of a plan path (no self-grant)', () => {
expect(detectSkillMatch([asstToolUse('Read', { file_path: 'docs/superpowers/plans/x.md' })])).toBe(false);
expect(detectSkillMatch([{ message: { role: 'assistant', content: [{ type: 'text', text: 'docs/superpowers/plans/x.md' }] } }])).toBe(false);
});
it('false for empty/non-array', () => {
expect(detectSkillMatch([])).toBe(false);
expect(detectSkillMatch(null)).toBe(false);
});
});
function ledgerWith(counts, skill, keywords) {
return {
state: {
...newCounterState({ taskId: 't', startedAtIso: '2026-05-30T00:00:00Z', firstPromptExcerpt: 'p' }),
counts: { Read: 0, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0, ...counts },
skill_match_within_task: skill,
},
lastKeywords: keywords,
};
}
describe('runLiveDecision — stickiness contract (V2-1)', () => {
it('persists skillMatchedThisTurn into the ledger (stickiness not lost)', () => {
const r = runLiveDecision({
event: { tool_name: 'Read' }, priorLedger: null,
promptText: 'router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
skillMatchedThisTurn: true,
});
expect(r.ledger.state.skill_match_within_task).toBe(true);
});
it('a skill earlier in a task keeps later mutating ops allowed past the hard limit (no false block)', () => {
const prior = ledgerWith({ Read: 60 }, true, ['router', 'gate', 'safe', 'baseline']);
const r = runLiveDecision({
event: { tool_name: 'Edit' }, priorLedger: prior,
promptText: 'продолжаем router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
skillMatchedThisTurn: false,
});
expect(r.action).toBe('allow');
});
it('skill match in task A does NOT exempt an unrelated task B (no cross-task leak)', () => {
const prior = ledgerWith({ Read: 60 }, true, ['router', 'gate', 'safe', 'baseline']);
const r = runLiveDecision({
event: { tool_name: 'Edit' }, priorLedger: prior,
promptText: 'регион маршрут лиды поставщик', currentKeywords: ['регион', 'маршрут', 'лиды', 'поставщик'],
skillMatchedThisTurn: false,
});
expect(r.ledger.state.skill_match_within_task).toBe(false);
expect(r.ledger.state.counts.Read).toBe(0);
});
it('hard-blocks a mutating tool past the limit in a no-skill task', () => {
const prior = ledgerWith({ Read: 60 }, false, ['router', 'gate', 'safe', 'baseline']);
const r = runLiveDecision({
event: { tool_name: 'Edit' }, priorLedger: prior,
promptText: 'router gate safe baseline', currentKeywords: ['router', 'gate', 'safe', 'baseline'],
skillMatchedThisTurn: false,
});
expect(r.action).toBe('hard_block');
});
});
describe('runMain — live integration', () => {
function fixtureTranscript(path, entries) {
writeFileSync(path, entries.map((e) => JSON.stringify(e)).join('\n'));
}
it('blocks an Edit when Read past hard with no skill, and names the escape', async () => {
const dir = mkdtempSync(join(tmpdir(), 'sbm-'));
const tpath = join(dir, 't.jsonl');
writeFileSync(join(dir, 'safe-baseline-ledger-S.json'), JSON.stringify({
state: { schema_version: 1, task_id: 't', counts: { Read: 60, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 }, skill_match_within_task: false },
lastKeywords: ['router', 'gate', 'safe', 'baseline'],
}));
fixtureTranscript(tpath, [{ type: 'user', message: { role: 'user', content: 'router gate safe baseline' } }]);
const res = await runMain({ event: { tool_name: 'Edit', session_id: 'S', transcript_path: tpath }, runtimeDir: dir });
expect(res.block).toBe(true);
expect(res.message).toMatch(/EnterPlanMode|Skill/);
});
it('allows a fresh task and persists the ledger', async () => {
const dir = mkdtempSync(join(tmpdir(), 'sbm-'));
const tpath = join(dir, 't.jsonl');
fixtureTranscript(tpath, [{ type: 'user', message: { role: 'user', content: 'регион маршрут лиды' } }]);
const res = await runMain({ event: { tool_name: 'Read', session_id: 'S2', transcript_path: tpath }, runtimeDir: dir });
expect(res.block).toBe(false);
expect(existsSync(join(dir, 'safe-baseline-ledger-S2.json'))).toBe(true);
});
it('allows an Edit (escape) when the current event is a Skill invocation', async () => {
const dir = mkdtempSync(join(tmpdir(), 'sbm-'));
const tpath = join(dir, 't.jsonl');
writeFileSync(join(dir, 'safe-baseline-ledger-S3.json'), JSON.stringify({
state: { schema_version: 1, task_id: 't', counts: { Read: 60, Grep: 0, Glob: 0, LS: 0, TodoWrite_writes: 0, AskUserQuestion: 0 }, skill_match_within_task: false },
lastKeywords: ['router', 'gate', 'safe', 'baseline'],
}));
fixtureTranscript(tpath, [{ type: 'user', message: { role: 'user', content: 'router gate safe baseline' } }]);
const res = await runMain({ event: { tool_name: 'Skill', session_id: 'S3', transcript_path: tpath }, runtimeDir: dir });
expect(res.block).toBe(false);
});
});
+75 -13
View File
@@ -27,6 +27,7 @@ import {
isProductionCodePath,
readRouterState,
} from './enforce-hook-helpers.mjs';
import { join, dirname, basename } from 'node:path';
const RULE_KEY_TDD = 'tdd-gate';
const RULE_KEY_PLAN = 'writing-plans-required';
@@ -132,8 +133,56 @@ function hasPlanIndicator(turn) {
return false;
}
const AGENT_ID_RE = /agentId:\s*([0-9a-f]+)/i;
/**
* Cross-actor (Z Part 2): extract agentIds of subagents spawned by a `Task`
* tool in the controller's current turn. The agentId comes from the harness-
* written Task tool_result text ("agentId: <hex>") the controller cannot forge
* a tool_result in its own transcript. Only hex ids are accepted, so a crafted
* "agentId: ../../x" cannot become a path-traversal into an arbitrary file.
*/
export function turnTaskAgentIds(turn) {
const taskUseIds = new Set();
for (const e of turn || []) {
const c = e && e.message && e.message.content;
if (!Array.isArray(c)) continue;
for (const b of c) {
if (b && b.type === 'tool_use' && b.name === 'Task') taskUseIds.add(b.id);
}
}
const ids = [];
for (const e of turn || []) {
const c = e && e.message && e.message.content;
if (!Array.isArray(c)) continue;
for (const b of c) {
if (!b || b.type !== 'tool_result' || !taskUseIds.has(b.tool_use_id)) continue;
const txt = typeof b.content === 'string' ? b.content
: Array.isArray(b.content) ? b.content.map((p) => p && p.text).filter(Boolean).join('\n') : '';
const m = txt.match(AGENT_ID_RE);
if (m) ids.push(m[1]);
}
}
return ids;
}
/**
* Derive subagent transcript paths from the controller transcript path and a
* list of agentIds. Subagent transcripts live at
* <projects>/<slug>/<controller-session>/subagents/agent-<agentId>.jsonl
* i.e. nested under the controller session's own directory (bound to it), while
* the controller transcript is <...>/<controller-session>.jsonl.
*/
export function subagentTranscriptPaths(controllerTranscriptPath, agentIds) {
const p = String(controllerTranscriptPath || '');
if (!p) return [];
const dir = dirname(p);
const base = basename(p).replace(/\.jsonl$/i, '');
return (agentIds || []).map((id) => join(dir, base, 'subagents', `agent-${id}.jsonl`));
}
export function decide({
toolName, filePath, transcriptEntries, classification, override, overridePlan,
toolName, filePath, transcriptEntries, classification, override, overridePlan, subagentEntriesList = [],
}) {
if (!['Edit', 'Write', 'MultiEdit'].includes(toolName)) return { block: false };
if (!isProductionCodePath(filePath)) return { block: false };
@@ -150,36 +199,37 @@ export function decide({
`[enforce-tdd-gate] task_type="${taskType}" requires a plan before production-code edit.`,
`Either invoke superpowers:writing-plans via Skill tool,`,
`or reference an existing plan file (docs/superpowers/plans/...) in this turn first.`,
``,
`Override: "быстрый коммит" / "ремонт инфраструктуры" in your prompt.`,
].join('\n'),
};
}
}
// Rule #3 — TDD gate.
// Rule #3 — TDD gate. Credit the controller's own turn OR a subagent that was
// spawned by a Task in this turn (cross-actor, Z Part 2). Subagent evidence is
// read from its agent-<id>.jsonl, which is tamper-protected by the transcript
// Write-deny (Z Part 1) — so crediting it does not open a forgery channel.
if (override) return { block: false };
const hasTest = hasMatchingTestEdit(turn, filePath);
const subList = Array.isArray(subagentEntriesList) ? subagentEntriesList : [];
const hasTest = hasMatchingTestEdit(turn, filePath) || subList.some((es) => hasMatchingTestEdit(es, filePath));
if (!hasTest) {
return {
block: true,
message: [
`[enforce-tdd-gate] Production code edit on "${filePath}" without preceding test edit.`,
`Write the failing test FIRST in the corresponding *.test.mjs / *.spec.ts / *Test.php.`,
`Write the failing test FIRST in the corresponding *.test.mjs / *.spec.ts / *Test.php`,
`(a subagent's test edit, if it was spawned by a Task in this turn, is also credited).`,
`Then run vitest/pest to confirm RED, then return to this prod-code Edit.`,
``,
`Override: "срочно" / "быстрый коммит" / "ремонт инфраструктуры".`,
].join('\n'),
};
}
if (!hasFailingTestRun(turn)) {
const hasRed = hasFailingTestRun(turn) || subList.some((es) => hasFailingTestRun(es));
if (!hasRed) {
return {
block: true,
message: [
`[enforce-tdd-gate] Test was edited but no vitest/pest run with RED output observed in this turn.`,
`[enforce-tdd-gate] Test was edited but no vitest/pest run with RED output observed in this turn`,
`(nor in any in-turn subagent transcript).`,
`Run the test suite (vitest run <test-file> / composer test) to confirm RED before prod-code edit.`,
``,
`Override: "срочно" / "быстрый коммит" / "ремонт инфраструктуры".`,
].join('\n'),
};
}
@@ -205,7 +255,19 @@ async function main() {
task_type: state.classification.task_type,
} : null;
const result = decide({ toolName, filePath, transcriptEntries: transcript, classification, override, overridePlan });
// Cross-actor (Z Part 2): read transcripts of subagents spawned by a Task in
// this turn, bound to the controller session via the derived path. Best-effort
// — a missing/unreadable subagent transcript just yields no extra credit
// (stricter), never an error.
let subagentEntriesList = [];
try {
const turn = lastTurnEntries(transcript);
const agentIds = turnTaskAgentIds(turn);
const paths = subagentTranscriptPaths(event.transcript_path, agentIds);
subagentEntriesList = paths.map((p) => readTranscript(p)).filter((e) => Array.isArray(e) && e.length);
} catch { subagentEntriesList = []; }
const result = decide({ toolName, filePath, transcriptEntries: transcript, classification, override, overridePlan, subagentEntriesList });
exitDecision(result);
} catch {
exitDecision({ block: false });
+81 -1
View File
@@ -1,5 +1,79 @@
import { describe, it, expect } from 'vitest';
import { decide } from './enforce-tdd-gate.mjs';
import { decide, turnTaskAgentIds, subagentTranscriptPaths } from './enforce-tdd-gate.mjs';
// Z Part 2 (2026-05-31): the tdd-gate must credit a subagent's test edit + RED
// when that subagent was spawned by a Task in the controller's current turn.
// Pairs with the transcript Write-hole closed in enforce-runtime-write-deny.mjs
// (Z Part 1) so the credited agent-<id>.jsonl cannot be forged.
describe('enforce-tdd-gate Z cross-actor (pairs with enforce-runtime-write-deny Part 1)', () => {
const subagentRedRun = [
{ message: { role: 'user', content: 'write the failing test for foo and confirm RED' } },
{ message: { role: 'assistant', content: [
{ type: 'tool_use', id: 's1', name: 'Write', input: { file_path: 'tools/foo.test.mjs' } },
{ type: 'tool_use', id: 's2', name: 'Bash', input: { command: 'npx vitest run tools/foo.test.mjs' } },
] } },
{ message: { role: 'user', content: [ { type: 'tool_result', tool_use_id: 's2', content: 'Tests 1 failed | 0 passed' } ] } },
];
it('credits a subagent test edit + RED for the controller prod edit', () => {
const r = decide({
toolName: 'Edit',
filePath: 'tools/foo.mjs',
transcriptEntries: [
{ message: { role: 'user', content: 'delegate the test, then I implement' } },
{ message: { role: 'assistant', content: [ { type: 'tool_use', id: 't1', name: 'Task', input: { subagent_type: 'tester' } } ] } },
{ message: { role: 'user', content: [ { type: 'tool_result', tool_use_id: 't1', content: 'done. agentId: a1234abcd' } ] } },
],
subagentEntriesList: [subagentRedRun],
});
expect(r.block).toBe(false);
});
it('still blocks when subagent edited a test but NO RED exists anywhere', () => {
const subNoRed = [
{ message: { role: 'user', content: 'write test' } },
{ message: { role: 'assistant', content: [ { type: 'tool_use', id: 's1', name: 'Write', input: { file_path: 'tools/foo.test.mjs' } } ] } },
];
const r = decide({
toolName: 'Edit', filePath: 'tools/foo.mjs',
transcriptEntries: [ { message: { role: 'user', content: 'go' } } ],
subagentEntriesList: [subNoRed],
});
expect(r.block).toBe(true);
expect(r.message).toMatch(/RED/);
});
it('preserves old behavior when no subagent entries (blocks without test)', () => {
const r = decide({
toolName: 'Edit', filePath: 'tools/foo.mjs',
transcriptEntries: [ { message: { role: 'user', content: 'go' } } ],
subagentEntriesList: [],
});
expect(r.block).toBe(true);
expect(r.message).toMatch(/without preceding test edit/);
});
it('turnTaskAgentIds extracts a hex agentId from an in-turn Task tool_result', () => {
const turn = [
{ message: { role: 'assistant', content: [ { type: 'tool_use', id: 't1', name: 'Task', input: {} } ] } },
{ message: { role: 'user', content: [ { type: 'tool_result', tool_use_id: 't1', content: 'ok agentId: a1b2c3d4e5' } ] } },
];
expect(turnTaskAgentIds(turn)).toContain('a1b2c3d4e5');
});
it('turnTaskAgentIds ignores non-Task results and rejects non-hex ids (no path traversal)', () => {
const turn = [
{ message: { role: 'assistant', content: [ { type: 'tool_use', id: 'b1', name: 'Bash', input: {} } ] } },
{ message: { role: 'user', content: [ { type: 'tool_result', tool_use_id: 'b1', content: 'agentId: ../../evil' } ] } },
];
expect(turnTaskAgentIds(turn)).toHaveLength(0);
});
it('subagentTranscriptPaths derives <dir>/<sessbase>/subagents/agent-<id>.jsonl', () => {
const paths = subagentTranscriptPaths('/p/projects/slug/sessUUID.jsonl', ['a1b2']);
expect(paths[0].split('\\').join('/')).toBe('/p/projects/slug/sessUUID/subagents/agent-a1b2.jsonl');
});
});
function userMsg(text) {
return { message: { role: 'user', content: text } };
@@ -38,6 +112,8 @@ describe('enforce-tdd-gate / decide', () => {
});
expect(r.block).toBe(true);
expect(r.message).toMatch(/without preceding test edit/);
// 1A (2026-05-31): не рекламировать мёртвые override-фразы (findOverride — заглушка v4).
expect(r.message).not.toMatch(/Override:/);
});
it('blocks when test edited but no vitest RED observed', () => {
@@ -51,6 +127,8 @@ describe('enforce-tdd-gate / decide', () => {
});
expect(r.block).toBe(true);
expect(r.message).toMatch(/no vitest.*RED/);
// 1A (2026-05-31): не рекламировать мёртвые override-фразы (findOverride — заглушка v4).
expect(r.message).not.toMatch(/Override:/);
});
it('allows after test edit + vitest RED', () => {
@@ -107,6 +185,8 @@ describe('enforce-tdd-gate / decide', () => {
});
expect(r.block).toBe(true);
expect(r.message).toMatch(/requires a plan/);
// 1A (2026-05-31): не рекламировать мёртвые override-фразы (findOverride — заглушка v4).
expect(r.message).not.toMatch(/Override:/);
});
it('allows feature edit when Skill(superpowers:writing-plans) invoked', () => {
-2
View File
@@ -70,8 +70,6 @@ export function decide({ toolName, command, sentinel, sentinelAge, override, ove
message: [
`[enforce-verify-before-push] No verification artifact found.`,
`Run a full test suite first (vitest run / composer test) before \`git ${kind}\`.`,
``,
`Override: "срочно" / "быстрый коммит" / "ремонт инфраструктуры" in your prompt.`,
].join('\n'),
};
}
@@ -153,6 +153,9 @@ describe('enforce-verify-before-push / decide', () => {
});
expect(r.block).toBe(true);
expect(r.message).toMatch(/No verification/);
// 1A (2026-05-31): не рекламировать мёртвые override-фразы (findOverride — заглушка v4).
expect(r.message).not.toMatch(/Override:/);
expect(r.message).not.toMatch(/срочно|ремонт инфраструктуры/);
});
it('does NOT emit override-missing-justification diagnostic for overrides without requires_justification', () => {
+84
View File
@@ -0,0 +1,84 @@
#!/usr/bin/env node
/**
* llm-judge-config the Layer 4 enabling-gate for router-gate v4.
*
* The LLM-judge engine (llm-judge.mjs) is fully built but MUST stay OFF until
* the owner deliberately turns it on, because enabling it incurs real LLM cost
* (~$3001500/month per the v4.1 amendment). This module is the single switch.
*
* SAFE-BY-DEFAULT CONTRACT:
* enabled === true the explicit flag ROUTER_LLM_JUDGE_ENABLED is truthy
* AND a key is resolvable (keychain first, then env).
* Anything else enabled:false. Building this file does NOT enable the judge:
* with no flag and no key the gate is closed. keychainGet errors degrade to
* "no key, disabled" (never throw).
*
* Activation (a separate, owner-driven step NOT done here):
* 1. store the API key in the OS keychain (or set ROUTER_LLM_KEY),
* 2. set ROUTER_LLM_JUDGE_ENABLED=1,
* 3. register the enforce-llm-judge-* hooks in .claude/settings.json.
* Cost starts only after all three.
*/
import { JUDGE_MODELS } from './llm-judge.mjs';
const ENABLE_FLAG = 'ROUTER_LLM_JUDGE_ENABLED';
const KEY_ENV = 'ROUTER_LLM_KEY';
const BASE_URL_ENV = 'ROUTER_LLM_BASE_URL';
const KEYCHAIN_SERVICE = 'router-gate-llm-judge';
const KEYCHAIN_ACCOUNT = 'default';
function isTruthyFlag(v) {
if (typeof v !== 'string') return false;
return v.trim().toLowerCase() === '1' || v.trim().toLowerCase() === 'true';
}
/**
* Resolve the Layer 4 judge configuration.
*
* @param {object} [args]
* @param {object} [args.env] - environment map (defaults to process.env)
* @param {Function} [args.keychainGet] - () => string|null, OS-keychain reader (injectable for tests)
* @returns {{enabled:boolean, apiKey:string|null, baseUrl:string|null, models:string[]}}
*/
export function resolveJudgeConfig({ env = process.env, keychainGet = defaultKeychainGet } = {}) {
let keychainKey = null;
try {
const v = keychainGet();
keychainKey = v ? String(v) : null;
} catch {
keychainKey = null;
}
const envKey = env[KEY_ENV] ? String(env[KEY_ENV]) : null;
const apiKey = keychainKey || envKey || null;
const flagOn = isTruthyFlag(env[ENABLE_FLAG]);
const enabled = flagOn && apiKey !== null;
return {
enabled,
apiKey,
baseUrl: env[BASE_URL_ENV] ? String(env[BASE_URL_ENV]) : null,
models: JUDGE_MODELS.multi,
};
}
/**
* Default OS-keychain reader. Lazily loads `keytar`; returns null if keytar is
* absent or the entry is missing. Never throws (caller also guards).
*/
export function defaultKeychainGet() {
try {
// Lazy require keeps the native dep optional — tests inject keychainGet and
// never hit this path; the no-op posture means missing keytar => no key.
const require = createRequire(import.meta.url);
const keytar = require('keytar');
const v = keytar.getPassword ? keytar.getPasswordSync?.(KEYCHAIN_SERVICE, KEYCHAIN_ACCOUNT) : null;
return v || null;
} catch {
return null;
}
}
import { createRequire } from 'node:module';
export const _internals = { ENABLE_FLAG, KEY_ENV, BASE_URL_ENV, KEYCHAIN_SERVICE, KEYCHAIN_ACCOUNT, isTruthyFlag };
+75
View File
@@ -0,0 +1,75 @@
// tools/llm-judge-config.test.mjs
// Router-gate v4 Layer 4 enabling-gate. The judge is OFF by default and only
// becomes enabled when BOTH an explicit flag is set AND a key is resolvable.
// Building this switch does NOT flip it — no key + no flag => disabled.
import { describe, it, expect } from 'vitest';
import { resolveJudgeConfig } from './llm-judge-config.mjs';
describe('llm-judge-config resolveJudgeConfig()', () => {
it('is DISABLED by default: no flag, no key', () => {
const c = resolveJudgeConfig({ env: {}, keychainGet: () => null });
expect(c.enabled).toBe(false);
expect(c.apiKey).toBe(null);
});
it('stays DISABLED when a key exists but the enable flag is not set', () => {
const c = resolveJudgeConfig({ env: {}, keychainGet: () => 'sk-test' });
expect(c.enabled).toBe(false);
expect(c.apiKey).toBe('sk-test');
});
it('stays DISABLED when the flag is set but no key is resolvable', () => {
const c = resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: '1' }, keychainGet: () => null });
expect(c.enabled).toBe(false);
expect(c.apiKey).toBe(null);
});
it('is ENABLED only when the flag is set AND a key is resolvable (from keychain)', () => {
const c = resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: '1' }, keychainGet: () => 'sk-keychain' });
expect(c.enabled).toBe(true);
expect(c.apiKey).toBe('sk-keychain');
});
it('prefers the keychain key over the env fallback', () => {
const c = resolveJudgeConfig({
env: { ROUTER_LLM_JUDGE_ENABLED: '1', ROUTER_LLM_KEY: 'sk-env' },
keychainGet: () => 'sk-keychain',
});
expect(c.apiKey).toBe('sk-keychain');
});
it('falls back to the env key when the keychain is empty', () => {
const c = resolveJudgeConfig({
env: { ROUTER_LLM_JUDGE_ENABLED: '1', ROUTER_LLM_KEY: 'sk-env' },
keychainGet: () => null,
});
expect(c.enabled).toBe(true);
expect(c.apiKey).toBe('sk-env');
});
it('accepts "true" (case-insensitive) as the enable flag', () => {
const c = resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: 'TRUE' }, keychainGet: () => 'k' });
expect(c.enabled).toBe(true);
});
it('treats an arbitrary flag value (e.g. "0", "no") as NOT enabled', () => {
expect(resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: '0' }, keychainGet: () => 'k' }).enabled).toBe(false);
expect(resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: 'no' }, keychainGet: () => 'k' }).enabled).toBe(false);
});
it('exposes default models and passes through baseUrl from env', () => {
const c = resolveJudgeConfig({
env: { ROUTER_LLM_JUDGE_ENABLED: '1', ROUTER_LLM_BASE_URL: 'https://example/api' },
keychainGet: () => 'k',
});
expect(Array.isArray(c.models)).toBe(true);
expect(c.models.length).toBeGreaterThan(0);
expect(c.baseUrl).toBe('https://example/api');
});
it('never throws when keychainGet itself throws — degrades to no key, disabled', () => {
const c = resolveJudgeConfig({ env: { ROUTER_LLM_JUDGE_ENABLED: '1' }, keychainGet: () => { throw new Error('keychain locked'); } });
expect(c.enabled).toBe(false);
expect(c.apiKey).toBe(null);
});
});
+30 -1
View File
@@ -68,14 +68,43 @@ import { homedir } from 'node:os';
import { readStdin, parseEventJson, exitDecision } from './enforce-hook-helpers.mjs';
import { llmJudgeCall, readJudgeBudget, bumpJudgeBudget, JUDGE_SESSION_BUDGET } from './llm-judge.mjs';
// Calibration 1 (2026-05-31) — `Skill` removed from judge scope (SCOPE fix, NOT
// a discipline drop). Invoking a Skill mutates no state; it is the prescribed
// §17 entry into work. Judging the skill-invocation itself and blocking on
// doubt directly contradicts §17 (which mandates skills). The real mutations a
// skill leads to (Edit/Write/MultiEdit/Bash/PowerShell/commit/push/Task) remain
// fully judged below — doubt→block on those is unchanged.
export const MUTATING_TOOLS = new Set([
'Edit', 'Write', 'MultiEdit', 'NotebookEdit', 'Bash', 'PowerShell', 'Skill', 'Task', 'Workflow',
'Edit', 'Write', 'MultiEdit', 'NotebookEdit', 'Bash', 'PowerShell', 'Task', 'Workflow',
]);
function runtimeDir(override) {
return override || join(homedir(), '.claude', 'runtime');
}
/**
* Calibration 4 (soft, 2026-05-31): the classifier's distilled task summary is
* lossy and sometimes "(unknown)" even for a perfectly clear user request,
* which made the judge block all real edits (no task to compare doubtblock).
* When the summary is unknown/empty, fall back to judging against the user's
* actual last prompt the ground-truth request instead of nothing.
*
* This is NOT calibration 2 (which would blindly ALLOW on unknown). The judge
* still runs and still blocks on doubt; it just uses better evidence. When both
* the summary and the user prompt are unavailable, the task stays "(unknown)"
* and doubtblock is preserved.
*/
export function resolveEffectiveTask(declaredTask, lastUserPrompt) {
const dt = declaredTask || {};
const summary = dt.task_summary;
const summaryUnknown = !summary || summary === '(unknown)' || !String(summary).trim();
const prompt = typeof lastUserPrompt === 'string' ? lastUserPrompt.trim() : '';
if (summaryUnknown && prompt) {
return { ...dt, task_summary: prompt, task_source: 'user_prompt_fallback' };
}
return dt;
}
/** Read the classifier-written declared task for this session; stub on miss. */
export function readDeclaredTask({ sessionId, runtimeDirOverride }) {
const path = join(runtimeDir(runtimeDirOverride), `router-state-${sessionId || 'unknown'}.json`);
+42
View File
@@ -69,6 +69,38 @@ describe('judgePerTool', () => {
});
});
import { resolveEffectiveTask } from './llm-judge-per-tool.mjs';
// Calibration 4 (soft, 2026-05-31) — when the classifier wrote "(unknown)" as
// the declared task (its summary is lossy/unreliable), fall back to judging
// against the user's actual last prompt instead of an empty task. NOT
// calibration 2: the judge still blocks on doubt — it just uses better
// evidence (the literal user request) when the classifier summary is empty.
describe('resolveEffectiveTask — calibration 4 user-prompt fallback', () => {
it('keeps the classifier summary when it is meaningful', () => {
const r = resolveEffectiveTask({ task_summary: 'implement parallel-session-lock', recommended_node: '#19' }, 'some prompt');
expect(r.task_summary).toBe('implement parallel-session-lock');
expect(r.task_source).toBeUndefined();
});
it('falls back to the user prompt when summary is "(unknown)"', () => {
const r = resolveEffectiveTask({ task_summary: '(unknown)', recommended_node: null }, 'реализуй живой main для parallel-session-lock');
expect(r.task_summary).toBe('реализуй живой main для parallel-session-lock');
expect(r.task_source).toBe('user_prompt_fallback');
});
it('falls back when summary is empty or blank', () => {
expect(resolveEffectiveTask({ task_summary: '' }, 'do X').task_summary).toBe('do X');
expect(resolveEffectiveTask({ task_summary: ' ' }, 'do X').task_summary).toBe('do X');
});
it('stays unknown when both summary and user prompt are unavailable (still blocks on doubt)', () => {
const r = resolveEffectiveTask({ task_summary: '(unknown)' }, '');
expect(r.task_summary).toBe('(unknown)');
expect(r.task_source).toBeUndefined();
});
});
import { MUTATING_TOOLS, readDeclaredTask } from './llm-judge-per-tool.mjs';
describe('per-tool helpers', () => {
@@ -79,6 +111,16 @@ describe('per-tool helpers', () => {
expect(MUTATING_TOOLS.has('Read')).toBe(false);
});
// Calibration 1 (2026-05-31) — SCOPE fix, discipline NOT lowered.
// Invoking a Skill changes no state; it is the prescribed §17 entry into
// work. Judging the skill-invocation itself and blocking on doubt directly
// contradicts §17 (which mandates skills). The real mutations a skill leads
// to (Edit/Write/Bash/commit/push) stay fully judged, so removing Skill from
// the judge scope does not lower discipline.
it('does NOT treat Skill as mutating (calibration 1 — prescribed §17 entry, mutates nothing)', () => {
expect(MUTATING_TOOLS.has('Skill')).toBe(false);
});
it('readDeclaredTask falls back to a stub when state missing', () => {
const dt = readDeclaredTask({ sessionId: 'no-such-session', runtimeDirOverride: '/nonexistent' });
expect(dt).toHaveProperty('task_summary');
+1 -1
View File
@@ -24,7 +24,7 @@ export function computeWorkspaceHash(workspacePath) {
return createHash('md5').update(String(workspacePath || ''), 'utf-8').digest('hex').slice(0, 12);
}
function isStale(record, now) {
export function isStale(record, now) {
if (!record || typeof record !== 'object') return true;
const ttl = typeof record.ttl_ms === 'number' ? record.ttl_ms : LOCK_DEFAULT_TTL_MS;
return now - (record.acquired_at || 0) > ttl;
+21
View File
@@ -6,6 +6,7 @@ import {
release,
refresh,
computeWorkspaceHash,
isStale,
LOCK_DEFAULT_TTL_MS,
} from './parallel-session-lock.mjs';
@@ -91,6 +92,26 @@ describe('parallel-session-lock pure module (Stream H Task 7)', () => {
});
});
// isStale is exported (B, 2026-05-31) so the wrapper's prune step reuses the
// EXACT same staleness definition — single source of truth, no divergence that
// could ever prune a still-fresh (active) lock.
describe('isStale (exported for prune support)', () => {
it('true when now - acquired_at exceeds ttl_ms', () => {
expect(isStale({ acquired_at: 0, ttl_ms: 100 }, 1000)).toBe(true);
});
it('false when still within ttl (active lock — never pruned)', () => {
expect(isStale({ acquired_at: 900, ttl_ms: 1000 }, 1000)).toBe(false);
});
it('true for a malformed/missing record', () => {
expect(isStale(null, 1000)).toBe(true);
expect(isStale(undefined, 1000)).toBe(true);
});
it('uses the default TTL when ttl_ms is absent', () => {
expect(isStale({ acquired_at: 0 }, LOCK_DEFAULT_TTL_MS + 1)).toBe(true);
expect(isStale({ acquired_at: 0 }, LOCK_DEFAULT_TTL_MS - 1)).toBe(false);
});
});
describe('computeWorkspaceHash (Stream H Task 7)', () => {
it('returns 12 hex chars', () => {
const h = computeWorkspaceHash('/some/path');
+19
View File
@@ -40,6 +40,25 @@ export const DEFAULT_PROTECTED_PATTERNS = [
/(^|\/)\.npmrc$/i,
];
// Read-tool deny list — narrower than DEFAULT_PROTECTED_PATTERNS (over-block fix 2026-05-31).
// Smoke 5 reused the full protected-list for the Read tool, which blocked Read of
// CLAUDE.md, the normative docs and the memory/ index — breaking the legit
// claude-md-management / memory-sync workflow (harness Edit requires a prior Read).
// Read of those files has NO exfil value: CLAUDE.md / Pravila / PSR / Tooling are
// public-in-repo, memory/ is the controller's own index. The genuine Read-exfil
// targets are cross-session transcripts (.jsonl), runtime side-channels, settings
// and secrets — those stay blocked here. The Bash/PowerShell read gate (cat /
// Get-Content) and the Write gate keep using the full DEFAULT_PROTECTED_PATTERNS,
// so CLAUDE.md / memory remain protected against shell-read and overwrite.
// NB: `.claude/projects/.*\.jsonl$` matches transcripts but NOT the `memory/`
// subdirectory (memory files are *.md), so MEMORY.md stays readable.
export const READ_DENY_PATTERNS = [
/(^|\/)\.claude\/projects\/.*\.jsonl$/i, // cross-session transcripts (parent-context exfil)
/(^|\/)\.claude\/runtime(\/|$)/i, // runtime side-channels (approve files, sentinels, state)
/(^|\/)\.claude\/settings(\.local)?\.json$/i, // harness/hook config
/(^|\/)\.env(\.|$)/i, // secrets
];
export function isProtectedPath(p, pathNormalize = defaultPathNormalize, patterns = DEFAULT_PROTECTED_PATTERNS) {
const n = pathNormalize(p);
if (!n) return false;
+40
View File
@@ -242,3 +242,43 @@ describe('isProtectedPath — runtime dir without trailing slash (review fix)',
expect(isProtectedPath('~/.claude/runtime/x.json', defaultPathNormalize, DEFAULT_PROTECTED_PATTERNS)).toBe(true);
});
});
import { READ_DENY_PATTERNS } from './shell-content-rules.mjs';
// Over-block fix (2026-05-31): the Read tool needs a NARROWER deny list than the
// Bash/PowerShell/Write gate. Read of CLAUDE.md / Pravila / memory has no exfil
// value (public-in-repo / own memory index); the genuine Read-exfil targets are
// cross-session transcripts (.jsonl), runtime side-channels, settings, secrets.
describe('READ_DENY_PATTERNS (narrow Read-tool deny)', () => {
it.each([
'~/.claude/projects/abc/session.jsonl',
'/c/Users/Administrator/.claude/projects/crm/x.jsonl',
'~/.claude/runtime/router-state.json',
'~/.claude/runtime',
'~/.claude/settings.json',
'~/.claude/settings.local.json',
'.env',
'app/.env.production',
])('Read-denies genuine exfil target %s', (p) => {
expect(isProtectedPath(p, defaultPathNormalize, READ_DENY_PATTERNS)).toBe(true);
});
it.each([
'CLAUDE.md',
'/c/моя/проекты/портал crm/Документация/CLAUDE.md',
'/c/Users/Administrator/.claude/projects/crm/memory/MEMORY.md',
'/c/Users/Administrator/.claude/projects/crm/memory/feedback_x.md',
'docs/Pravila_raboty_Claude_v1_1.md',
'docs/Plugin_stack_rules_v1.md',
'docs/Tooling_v8_3.md',
'node_modules/shell-quote/index.js',
])('does NOT Read-deny public/normative/memory file %s', (p) => {
expect(isProtectedPath(p, defaultPathNormalize, READ_DENY_PATTERNS)).toBe(false);
});
it('DEFAULT_PROTECTED_PATTERNS still protects CLAUDE.md/Pravila/memory (Bash/PowerShell/Write gates unchanged)', () => {
expect(isProtectedPath('CLAUDE.md', defaultPathNormalize, DEFAULT_PROTECTED_PATTERNS)).toBe(true);
expect(isProtectedPath('docs/Pravila_raboty_Claude_v1_1.md', defaultPathNormalize, DEFAULT_PROTECTED_PATTERNS)).toBe(true);
expect(isProtectedPath('memory/feedback.md', defaultPathNormalize, DEFAULT_PROTECTED_PATTERNS)).toBe(true);
});
});