# pcq

pcq is the contract for agent-run ML experiments — `cq.yaml` format, JSON
contracts, MCP tool surface, strictness gates, and run-evidence schema. The
contract specification lives under [`spec/`](https://github.com/playidea-lab/pcq/tree/main/spec).
The Apache-2.0 Python package on PyPI (`uv add pcq`) is the reference
implementation; the CQ Go service worker is a second implementation targeting
the same contract.

Use pcq when an agent, CI job, service worker, notebook, or human needs to run
ML code and collect reproducible evidence through a stable project contract.

Core identity:

- pcq does not operate the model.
- pcq operates the experiment boundary.
- The contract is the adapter.
- CQ is one managed consumer of the contract; pcq is useful without CQ.

Install:

```bash
uv add pcq
# Optional — expose pcq as MCP tools to Claude Code, Codex, custom agents:
uv add 'pcq[mcp]'
```

Canonical resources:

- Website: https://playidea-lab.github.io/pcq/
- Korean landing page: https://playidea-lab.github.io/pcq/index.ko.html
- Repository: https://github.com/playidea-lab/pcq
- PyPI: https://pypi.org/project/pcq/
- v4 direction: https://github.com/playidea-lab/pcq/blob/main/docs/V4_DIRECTION.md
- MCP integration: https://github.com/playidea-lab/pcq/blob/main/docs/MCP_INTEGRATION.md
- Full agent guide: https://playidea-lab.github.io/pcq/llms-full.txt
- Machine manifest: https://playidea-lab.github.io/pcq/agent-manifest.json

Core idea:

- `cq.yaml` declares the run command, config, metrics, inputs, and artifacts.
- Training code can use PyTorch, Hugging Face Trainer, Lightning, sklearn,
  TabPFN, PyCaret, XGBoost, shell scripts, remote jobs, or custom Python.
- pcq standardizes config loading, metric emission, artifact layout,
  validation, comparison, lineage, and final run evidence.
- pcq is not a trainer, model zoo, adapter matrix, or CQ-only client.

Primary agent commands:

```bash
pcq resolve --json
pcq inspect . --json
pcq validate . --strictness 2 --json
pcq run --path . --json
pcq run --path . --jsonl
pcq validate-run output --strictness 3 --json
pcq describe-run output --json
pcq compare-runs old_output new_output --json
pcq lineage output --json
pcq apply-plan experiment.plan.json --json
pcq mcp serve
```

MCP tools (with `pcq[mcp]` installed):

`resolve_project`, `inspect_project`, `validate_project`, `validate_run`,
`describe_run`, `compare_runs`, `lineage_chain`, `apply_plan`,
`apply_planset`, `init_experiment`, `finalize_run`, `agent_install`,
`agent_status`, `run_experiment`.

Metadata APIs (v4.4-4.6):

- `pcq.attribution(operator, author, committer)` — record who ran the experiment
  (agent identity, human author, git committer). env: CQ_ATTRIBUTION_*
- `pcq.worker_spec()` — auto-detect cpu/gpu/memory/os of the executing machine.
  env: CQ_WORKER_*
- `pcq.fingerprint(X, y, modality, task_kind, domain)` — PII-safe statistical
  summary of input data (shape, dtype, hash, distribution stats). env: CQ_FINGERPRINT_*

## Inference Metrics (v4.7, recommended)

11 recommended keys for benchmarking and runtime observability. Emit via
`pcq.log(...)` or `@key=value` stdout. No validation gate — metrics.json
remains free-key.

| Key | Unit | Meaning |
|-----|------|---------|
| `latency_p50_ms` | ms | median request latency |
| `latency_p95_ms` | ms | 95th-percentile latency |
| `latency_p99_ms` | ms | 99th-percentile latency |
| `latency_mean_ms` | ms | mean request latency |
| `throughput_qps` | req/s | queries per second |
| `tokens_per_sec` | tok/s | token throughput |
| `time_to_first_token_ms` | ms | TTFT (streaming) |
| `memory_peak_mb` | MB | peak CPU/system RAM |
| `vram_peak_mb` | MB | peak GPU VRAM |
| `batch_size` | — | inference batch size |
| `sequence_length` | — | input sequence length |

## Failure Categories (v4.8)

`failure.category` — regex heuristic on `failure.message`. Free strings
allowed when no pattern matches. 16 categories:

`config_error`, `missing_dependency`, `dataset_missing`, `dataset_shape`,
`label_contract`, `loss_contract`, `metric_contract`, `oom`, `nan_loss`,
`timeout`, `distributed_write_race`, `accuracy_below_threshold`,
`user_interrupted`, `disk_full`, `model_load_failed`, `unknown_exception`.

Agents use this field for retry / abort decisions. Full table with hints in
docs/AGENT_OPERATING_GUIDE.md and spec/SPEC.md `## Failure Categories`.

Runtime surfaces:

- `--json` emits one final parseable JSON object.
- `--jsonl` emits live newline-delimited JSON events.
- `--events PATH` writes live events to a JSONL file while preserving the
  selected stdout mode.

Standard artifacts:

- `config.json`
- `metrics.json`
- `manifest.json`
- `run_summary.json`
- `run_record.json`
- `validation_report.json`

Case studies (production dogfoods):

- mnist-dogfood-2026-05-08, tabular-dogfood-2026-05-09,
  mcp-dogfood-2026-05-10, cq-worker-dogfood-2026-05-10.
- All four under https://github.com/playidea-lab/pcq/tree/main/docs/case-studies/
- Mirrored on the site at #case-studies.

Roadmap and direction:

- v4 Direction: docs/V4_DIRECTION.md
- Completion Roadmap: docs/PCQ_COMPLETION_ROADMAP.md
- Site section: #roadmap

Tool response samples:

Real captured responses for the four most-called tools live inline at
https://playidea-lab.github.io/pcq/#tools-catalog (describe_run,
compare_runs, validate_run, lineage_chain). Volatile fields elided as
"...". Same samples are referenced from agent-manifest.json under
`tool_response_samples`.

Agent rule:

Prefer JSON/JSONL commands over scraping prose. Use `describe-run` and
`compare-runs` for decision facts. pcq reports facts; the agent or service
chooses policy.