Skip to content

QitOS

Tau-Bench

Tau-Bench Integration¶

What is supported¶

QitOS supports Tau-Bench through a canonical adapter path:

Adapter: qitos/benchmark/tau_bench/adapter.py
Self-contained runtime: qitos/benchmark/tau_bench/runtime.py + qitos/benchmark/tau_bench/port/*
Conversion: Tau task -> qitos.core.task.Task
Eval runner: examples/real/tau_bench_eval.py

No external tau_bench python package is required.

Why this matters¶

You can evaluate agent scaffolds with the same QitOS kernel and observability stack:

same AgentModule + Engine
same trace/qita workflow
same evaluate + metric interfaces

Quick commands¶

Single task¶

python examples/real/tau_bench_eval.py \
  --workspace ./qitos_tau_workspace \
  --tau-env retail --tau-split test \
  --task-index 0

Full eval¶

python examples/real/tau_bench_eval.py \
  --workspace ./qitos_tau_workspace \
  --tau-env retail --tau-split test \
  --run-all --num-trials 1 --concurrency 4 --resume

Pass@k-style repeated trials¶

python examples/real/tau_bench_eval.py \
  --workspace ./qitos_tau_workspace \
  --tau-env retail --tau-split test \
  --run-all --num-trials 3 --concurrency 6 --resume

Output¶

per-run JSONL records
aggregate metrics (success rate, avg reward, pass@k, etc.)
standard traces for qita inspection

Source Index¶