d7aa5efe30
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
49 lines
2.9 KiB
Markdown
49 lines
2.9 KiB
Markdown
# docs/ml — ML / AI playbook (map section A11)
|
|
|
|
Home of the `A11 «ML / AI-разработка»` section. Defines the tooling Лидерра uses
|
|
to build and test ML/AI capability. The portal currently ships no ML/AI code —
|
|
this section is the toolset, ready for when AI features are scoped.
|
|
|
|
## Toolset
|
|
|
|
| Tool | Role | Status |
|
|
|---|---|---|
|
|
| **claude-api skill** | Build AI features on the Anthropic SDK (lead qualification, call summaries, email drafts) with prompt caching. | reuse — already available |
|
|
| **context7 MCP** | Up-to-date docs for AI/ML libraries and SDKs. | reuse — already installed |
|
|
| **Sentry MCP** | Debug AI features in production via Sentry AI/LLM monitoring (read-only). | reuse — Tooling #34, pending the Sentry deployment (Б-1) |
|
|
| **promptfoo** | Test suite for LLM prompts/agents: assertions, regression, LLM-graded eval, red-team. | installed — `npx promptfoo` |
|
|
| **Data Scientist skill** | Classical-ML workflow: business objective → ML task, algorithm selection, feature engineering, evaluation. | installed — vendored skill |
|
|
| **Jupyter MCP** | Executable notebooks for real model training. | **deferred** — see below |
|
|
|
|
## Boundaries (which tool for which job)
|
|
|
|
- **Building an AI feature** (a prompt-backed endpoint) → the **claude-api skill**.
|
|
- **Testing / regression-checking an LLM prompt** → **promptfoo** (`docs/ml/promptfoo-example/`).
|
|
- **A classical-ML modelling question** (which algorithm, how to evaluate) → the
|
|
**Data Scientist skill** (`.claude/skills/data-scientist/`).
|
|
- **Executing a notebook / training a model** → **Jupyter MCP** — *deferred*.
|
|
- promptfoo's **red-team** tests *prompts*; the D3 Trail of Bits / Semgrep tools do
|
|
SAST of *code*. Different objects — not a duplication.
|
|
|
|
## promptfoo — running an eval
|
|
|
|
promptfoo makes **paid** Anthropic API calls. It runs **manually or in CI only** —
|
|
never in a git hook, never in pre-commit, never automatically.
|
|
|
|
- API key: `ANTHROPIC_API_KEY` env var (PowerShell User scope — the Sentry
|
|
`SENTRY_AUTH_TOKEN` pattern). Never commit a key.
|
|
- Run the seed example: `npm run eval:llm` (or
|
|
`npx promptfoo eval -c docs/ml/promptfoo-example/promptfooconfig.yaml`).
|
|
- Footprint note: promptfoo is a large devDependency (~1090 transitive packages,
|
|
one native module — `better-sqlite3` — which `prebuild-install` fetches as a
|
|
prebuilt binary; no local C++ toolchain is required when the prebuild download
|
|
succeeds). It is dev-tooling only — not shipped to the Laravel app.
|
|
|
|
## Jupyter MCP — why deferred
|
|
|
|
Jupyter MCP executes notebooks; it needs a Python ML environment (pandas /
|
|
scikit-learn / Jupyter). The machine is native Windows, deliberately runtime-minimal
|
|
(no Docker), and there is no model to train yet. Jupyter MCP is a **reserved slot**:
|
|
registered in the Tooling registry as *pending*, installed by a separate severable
|
|
task when a concrete ML model is scoped. See the A11 plan's "Deferred Task".
|