Lab 2 - 从 ReAct 升级到 PlanAct(30 分钟,含代码分步)¶
适用场景¶
ReAct 在长任务中容易局部贪心,你要引入显式规划提升稳定性。
学习目标¶
- 在同一内核下把策略从即时反应升级为先规划后执行。
- 保持对照公平:任务集与预算口径不变。
- 用 trace 验证收益与代价。
Part A:定义升级假设(5 分钟)¶
先明确实验假设:
hypothesis = {
"baseline": "ReAct",
"candidate": "PlanAct",
"expected_gain": "fewer invalid loops on long-horizon tasks",
"fixed_budget": {"max_steps": 10},
}
print(hypothesis)
这段不是业务代码,而是实验协议代码。建议和结果一起保存。
Part B:状态升级(5 分钟)¶
在 ReAct 状态上新增“计划执行必需字段”:
from dataclasses import dataclass, field
from typing import List
from qitos import StateSchema
@dataclass
class PlanActState(StateSchema):
plan_steps: List[str] = field(default_factory=list)
cursor: int = 0
scratchpad: List[str] = field(default_factory=list)
可选辅助字段:target_file、test_command。
Part C:实现两阶段策略(10 分钟)¶
C1. Planner 生成计划¶
from qitos.kit.planning import parse_numbered_plan
PLAN_PROMPT = "Task: {task}\nReturn a numbered plan (3-5 steps)."
def build_plan(llm, task: str):
raw = llm([
{"role": "system", "content": "Return numbered plan only."},
{"role": "user", "content": PLAN_PROMPT.format(task=task)},
])
return parse_numbered_plan(str(raw))
C2. Agent 的 decide/reduce 逻辑¶
from qitos import Decision
class PlanActAgent(...):
def decide(self, state: PlanActState, observation: dict):
if not state.plan_steps or state.cursor >= len(state.plan_steps):
plan = build_plan(self.llm, state.task)
if not plan:
return Decision.final("Failed to build a valid plan")
state.plan_steps = plan
state.cursor = 0
return Decision.wait("plan_ready")
return None # 让 Engine+LLM 执行当前计划步骤
def reduce(self, state: PlanActState, observation: dict, decision):
if observation['action_results'] and isinstance(observation['action_results'][0], dict):
r = observation['action_results'][0]
if r.get("status") == "success":
state.cursor += 1
if int(r.get("returncode", 1)) == 0:
state.final_result = "Verification passed"
state.cursor = len(state.plan_steps)
return state
Part D:运行与对照评测(10 分钟)¶
D1. 运行¶
D2. 对照统计片段¶
import json
from pathlib import Path
def read_summary(run_dir: str):
m = json.loads(Path(run_dir, "manifest.json").read_text(encoding="utf-8"))
return m.get("summary", {})
react = read_summary("runs/react_run")
planact = read_summary("runs/planact_run")
print("react steps:", react.get("steps"), "stop:", react.get("stop_reason"))
print("planact steps:", planact.get("steps"), "stop:", planact.get("stop_reason"))
你至少要回答:
- PlanAct 是否减少了空转步骤?
- 新增问题是否来自“规划质量”而不是“执行质量”?