Production Guide¶
Goal¶
Move from prototype to stable service behavior.
P0 safeguards¶
- Budget limits configured (
steps,runtime,tokens). - Stop reasons monitored.
- Trace persisted for every production run.
- Failure report retained for debugging.
Suggested deployment checklist¶
- Pin model + parser versions.
- Add regression taskset (small but representative).
- Run daily/PR regression with threshold gates.
- Add runtime dashboards from trace summaries.
Incident debugging playbook¶
- Locate failing
run_id. - Read
manifest.summary.stop_reason. - Find first
ok=falseevent inevents.jsonl. - Inspect corresponding step in
steps.jsonl. - Classify root cause:
- parser robustness
- tool input schema
- env capability/config
- model output drift
Guardrails worth adding early¶
- Strict parser mode for critical actions.
- Tool allowlist per agent.
- Max retries per action.
- Safety filters for shell/web tools.