Reflexion Example Walkthrough (Actor-Critic With Grounded Self-Reflection)¶
What this agent is¶
examples/patterns/reflexion.py implements a Reflexion-style actor-critic loop:
- the agent drafts an answer
- the agent critiques the draft against external evidence (source text)
- the agent revises, repeating for a bounded number of cycles
This example intentionally uses a strict JSON protocol for reflection payloads.
Core idea¶
Reflexion is not “add another prompt”. It is:
- forcing structured critique (missing/superfluous/grounding)
- grounding critique in external evidence (citations)
- turning critique into a revision policy
In QitOS terms:
- the “critic loop” can be implemented as:
- a separate
Criticmodule (Engine-level), or - a policy encoded in
AgentModule.decide(...)(this example)
This example chooses the second option to keep it self-contained and explicit.
Method-by-method design¶
State: keep evidence, draft, and reflections¶
Design principle:
- If reflection is real, you must store the critique artifacts.
What the example does:
- stores
page_html,page_textas external evidence - stores
draft_answerandreflections(each is a JSON object) - stores
max_reflectionsto bound the loop
decide: a staged pipeline¶
Design principle:
- Reflexion needs a deterministic staging gate:
- fetch -> extract -> reflect/revise -> finalize
What the example does:
- If no HTML:
Decision.act(http_get) - If no text:
Decision.act(extract_web_text) - Else: call
_reflect(...)(LLM) to produce structured JSON - If
needs_revisionand cycles remain:Decision.wait(...)to loop again - Else:
Decision.final(...)with answer + citations
_reflect: strict output protocol as a research control¶
Design principle:
- If the output is not structured, you cannot evaluate it consistently.
What the example does:
- system instruction: “Return valid JSON only.”
- requires
citationswith exact supporting quote - requires “missing” and “superfluous” lists to avoid vague critiques
reduce: only moves tool outputs into evidence fields¶
Design principle:
- Keep
reduceas a state transition; don’t hide policy here.
What the example does:
- assigns
page_htmlfromhttp_get - assigns
page_textfromextract_web_text - policy logic (revision cycles) stays in
decide
What to modify to make it SOTA-ish¶
- Add a verifier:
- run a second pass to validate that citations appear in the source text
- Add stop criteria:
- stop if critique says
needs_revision=falsetwice in a row - Externalize critic to Engine:
- use
Engine(critics=[...])to decouple reflection from actor policy