Skip to content

QitOS

Walkthrough - Reflexion

Reflexion Example Walkthrough (Actor-Critic With Grounded Self-Reflection)¶

What this agent is¶

examples/patterns/reflexion.py implements a Reflexion-style actor-critic loop:

the agent drafts an answer
the agent critiques the draft against external evidence (source text)
the agent revises, repeating for a bounded number of cycles

This example intentionally uses a strict JSON protocol for reflection payloads.

Core idea¶

Reflexion is not “add another prompt”. It is:

forcing structured critique (missing/superfluous/grounding)
grounding critique in external evidence (citations)
turning critique into a revision policy

In QitOS terms:

the “critic loop” can be implemented as:
a separate Critic module (Engine-level), or
a policy encoded in AgentModule.decide(...) (this example)

This example chooses the second option to keep it self-contained and explicit.

Method-by-method design¶

State: keep evidence, draft, and reflections¶

Design principle:

If reflection is real, you must store the critique artifacts.

What the example does:

stores page_html, page_text as external evidence
stores draft_answer and reflections (each is a JSON object)
stores max_reflections to bound the loop

`decide`: a staged pipeline¶

Design principle:

Reflexion needs a deterministic staging gate:
fetch -> extract -> reflect/revise -> finalize

What the example does:

If no HTML: Decision.act(http_get)
If no text: Decision.act(extract_web_text)
Else: call _reflect(...) (LLM) to produce structured JSON
If needs_revision and cycles remain: Decision.wait(...) to loop again
Else: Decision.final(...) with answer + citations

`_reflect`: strict output protocol as a research control¶

Design principle:

If the output is not structured, you cannot evaluate it consistently.

What the example does:

system instruction: “Return valid JSON only.”
requires citations with exact supporting quote
requires “missing” and “superfluous” lists to avoid vague critiques

`reduce`: only moves tool outputs into evidence fields¶

Design principle:

Keep reduce as a state transition; don’t hide policy here.

What the example does:

assigns page_html from http_get
assigns page_text from extract_web_text
policy logic (revision cycles) stays in decide

What to modify to make it SOTA-ish¶

Add a verifier:
run a second pass to validate that citations appear in the source text
Add stop criteria:
stop if critique says needs_revision=false twice in a row
Externalize critic to Engine:
use Engine(critics=[...]) to decouple reflection from actor policy

Source Index¶