Files

T

Дмитрий d7aa5efe30 feat(a11): bootstrap docs/ml — README + promptfoo example + ADR-007

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-17 17:17:20 +03:00

promptfoo-example

feat(a11): bootstrap docs/ml — README + promptfoo example + ADR-007

2026-05-17 17:17:20 +03:00

README.md

feat(a11): bootstrap docs/ml — README + promptfoo example + ADR-007

2026-05-17 17:17:20 +03:00

README.md

docs/ml — ML / AI playbook (map section A11)

Home of the A11 «ML / AI-разработка» section. Defines the tooling Лидерра uses to build and test ML/AI capability. The portal currently ships no ML/AI code — this section is the toolset, ready for when AI features are scoped.

Toolset

Tool	Role	Status
claude-api skill	Build AI features on the Anthropic SDK (lead qualification, call summaries, email drafts) with prompt caching.	reuse — already available
context7 MCP	Up-to-date docs for AI/ML libraries and SDKs.	reuse — already installed
Sentry MCP	Debug AI features in production via Sentry AI/LLM monitoring (read-only).	reuse — Tooling #34, pending the Sentry deployment (Б-1)
promptfoo	Test suite for LLM prompts/agents: assertions, regression, LLM-graded eval, red-team.	installed — `npx promptfoo`
Data Scientist skill	Classical-ML workflow: business objective → ML task, algorithm selection, feature engineering, evaluation.	installed — vendored skill
Jupyter MCP	Executable notebooks for real model training.	deferred — see below

Boundaries (which tool for which job)

Building an AI feature (a prompt-backed endpoint) → the claude-api skill.
Testing / regression-checking an LLM prompt → promptfoo (docs/ml/promptfoo-example/).
A classical-ML modelling question (which algorithm, how to evaluate) → the Data Scientist skill (.claude/skills/data-scientist/).
Executing a notebook / training a model → Jupyter MCP — deferred.
promptfoo's red-team tests prompts; the D3 Trail of Bits / Semgrep tools do SAST of code. Different objects — not a duplication.

promptfoo — running an eval

promptfoo makes paid Anthropic API calls. It runs manually or in CI only — never in a git hook, never in pre-commit, never automatically.

API key: ANTHROPIC_API_KEY env var (PowerShell User scope — the Sentry SENTRY_AUTH_TOKEN pattern). Never commit a key.
Run the seed example: npm run eval:llm (or npx promptfoo eval -c docs/ml/promptfoo-example/promptfooconfig.yaml).
Footprint note: promptfoo is a large devDependency (~1090 transitive packages, one native module — better-sqlite3 — which prebuild-install fetches as a prebuilt binary; no local C++ toolchain is required when the prebuild download succeeds). It is dev-tooling only — not shipped to the Laravel app.

Jupyter MCP — why deferred

Jupyter MCP executes notebooks; it needs a Python ML environment (pandas / scikit-learn / Jupyter). The machine is native Windows, deliberately runtime-minimal (no Docker), and there is no model to train yet. Jupyter MCP is a reserved slot: registered in the Tooling registry as pending, installed by a separate severable task when a concrete ML model is scoped. See the A11 plan's "Deferred Task".