# pcq pcq is the contract for agent-run ML experiments — `cq.yaml` format, JSON contracts, MCP tool surface, strictness gates, and run-evidence schema. The contract specification lives under [`spec/`](https://github.com/playidea-lab/pcq/tree/main/spec). The Apache-2.0 Python package on PyPI (`uv add pcq`) is the reference implementation; the CQ Go service worker is a second implementation targeting the same contract. Use pcq when an agent, CI job, service worker, notebook, or human needs to run ML code and collect reproducible evidence through a stable project contract. Core identity: - pcq does not operate the model. - pcq operates the experiment boundary. - The contract is the adapter. - CQ is one managed consumer of the contract; pcq is useful without CQ. Install: ```bash uv add pcq # Optional — expose pcq as MCP tools to Claude Code, Codex, custom agents: uv add 'pcq[mcp]' ``` Canonical resources: - Website: https://playidea-lab.github.io/pcq/ - Korean landing page: https://playidea-lab.github.io/pcq/index.ko.html - Repository: https://github.com/playidea-lab/pcq - PyPI: https://pypi.org/project/pcq/ - v4 direction: https://github.com/playidea-lab/pcq/blob/main/docs/V4_DIRECTION.md - MCP integration: https://github.com/playidea-lab/pcq/blob/main/docs/MCP_INTEGRATION.md - Full agent guide: https://playidea-lab.github.io/pcq/llms-full.txt - Machine manifest: https://playidea-lab.github.io/pcq/agent-manifest.json Core idea: - `cq.yaml` declares the run command, config, metrics, inputs, and artifacts. - Training code can use PyTorch, Hugging Face Trainer, Lightning, sklearn, TabPFN, PyCaret, XGBoost, shell scripts, remote jobs, or custom Python. - pcq standardizes config loading, metric emission, artifact layout, validation, comparison, lineage, and final run evidence. - pcq is not a trainer, model zoo, adapter matrix, or CQ-only client. Primary agent commands: ```bash pcq resolve --json pcq inspect . --json pcq validate . --strictness 2 --json pcq run --path . --json pcq run --path . --jsonl pcq validate-run output --strictness 3 --json pcq describe-run output --json pcq compare-runs old_output new_output --json pcq lineage output --json pcq apply-plan experiment.plan.json --json pcq mcp serve ``` MCP tools (with `pcq[mcp]` installed): `resolve_project`, `inspect_project`, `validate_project`, `validate_run`, `describe_run`, `compare_runs`, `lineage_chain`, `apply_plan`, `apply_planset`, `init_experiment`, `finalize_run`, `agent_install`, `agent_status`, `run_experiment`. Metadata APIs (v4.4-4.6): - `pcq.attribution(operator, author, committer)` — record who ran the experiment (agent identity, human author, git committer). env: CQ_ATTRIBUTION_* - `pcq.worker_spec()` — auto-detect cpu/gpu/memory/os of the executing machine. env: CQ_WORKER_* - `pcq.fingerprint(X, y, modality, task_kind, domain)` — PII-safe statistical summary of input data (shape, dtype, hash, distribution stats). env: CQ_FINGERPRINT_* ## Inference Metrics (v4.7, recommended) 11 recommended keys for benchmarking and runtime observability. Emit via `pcq.log(...)` or `@key=value` stdout. No validation gate — metrics.json remains free-key. | Key | Unit | Meaning | |-----|------|---------| | `latency_p50_ms` | ms | median request latency | | `latency_p95_ms` | ms | 95th-percentile latency | | `latency_p99_ms` | ms | 99th-percentile latency | | `latency_mean_ms` | ms | mean request latency | | `throughput_qps` | req/s | queries per second | | `tokens_per_sec` | tok/s | token throughput | | `time_to_first_token_ms` | ms | TTFT (streaming) | | `memory_peak_mb` | MB | peak CPU/system RAM | | `vram_peak_mb` | MB | peak GPU VRAM | | `batch_size` | — | inference batch size | | `sequence_length` | — | input sequence length | ## Failure Categories (v4.8) `failure.category` — regex heuristic on `failure.message`. Free strings allowed when no pattern matches. 16 categories: `config_error`, `missing_dependency`, `dataset_missing`, `dataset_shape`, `label_contract`, `loss_contract`, `metric_contract`, `oom`, `nan_loss`, `timeout`, `distributed_write_race`, `accuracy_below_threshold`, `user_interrupted`, `disk_full`, `model_load_failed`, `unknown_exception`. Agents use this field for retry / abort decisions. Full table with hints in docs/AGENT_OPERATING_GUIDE.md and spec/SPEC.md `## Failure Categories`. Runtime surfaces: - `--json` emits one final parseable JSON object. - `--jsonl` emits live newline-delimited JSON events. - `--events PATH` writes live events to a JSONL file while preserving the selected stdout mode. Standard artifacts: - `config.json` - `metrics.json` - `manifest.json` - `run_summary.json` - `run_record.json` - `validation_report.json` Case studies (production dogfoods): - mnist-dogfood-2026-05-08, tabular-dogfood-2026-05-09, mcp-dogfood-2026-05-10, cq-worker-dogfood-2026-05-10. - All four under https://github.com/playidea-lab/pcq/tree/main/docs/case-studies/ - Mirrored on the site at #case-studies. Roadmap and direction: - v4 Direction: docs/V4_DIRECTION.md - Completion Roadmap: docs/PCQ_COMPLETION_ROADMAP.md - Site section: #roadmap Tool response samples: Real captured responses for the four most-called tools live inline at https://playidea-lab.github.io/pcq/#tools-catalog (describe_run, compare_runs, validate_run, lineage_chain). Volatile fields elided as "...". Same samples are referenced from agent-manifest.json under `tool_response_samples`. Agent rule: Prefer JSON/JSONL commands over scraping prose. Use `describe-run` and `compare-runs` for decision facts. pcq reports facts; the agent or service chooses policy.