Skip to content

Tau-Bench Integration

What is supported

QitOS supports Tau-Bench through a canonical adapter path:

  • Adapter: qitos/benchmark/tau_bench/adapter.py
  • Self-contained runtime: qitos/benchmark/tau_bench/runtime.py + qitos/benchmark/tau_bench/port/*
  • Conversion: Tau task -> qitos.core.task.Task
  • Eval runner: examples/real/tau_bench_eval.py

No external tau_bench python package is required.

Why this matters

You can evaluate agent scaffolds with the same QitOS kernel and observability stack:

  • same AgentModule + Engine
  • same trace/qita workflow
  • same evaluate + metric interfaces

Quick commands

Single task

python examples/real/tau_bench_eval.py \
  --workspace ./qitos_tau_workspace \
  --tau-env retail --tau-split test \
  --task-index 0

Full eval

python examples/real/tau_bench_eval.py \
  --workspace ./qitos_tau_workspace \
  --tau-env retail --tau-split test \
  --run-all --num-trials 1 --concurrency 4 --resume

Pass@k-style repeated trials

python examples/real/tau_bench_eval.py \
  --workspace ./qitos_tau_workspace \
  --tau-env retail --tau-split test \
  --run-all --num-trials 3 --concurrency 6 --resume

Output

  • per-run JSONL records
  • aggregate metrics (success rate, avg reward, pass@k, etc.)
  • standard traces for qita inspection

Source Index