Decisions and evaluation

What is a decision?

A decision is one agent turn: what went in, what came out, and optional context. Histeeria stores it, displays it in Monitoring, and feeds it to the evaluation engine.

Field	Purpose
`input`	Prompt, messages, or structured context
`output`	Agent response or action summary
`agent_id`	Logical agent identifier (string you choose)
`session_id`	Conversation or workflow run ID
`domain`	Evaluation context (e.g. `customer_support`, `legal`)
`metadata`	Custom key-value data (tags, user ID, flags)
`input_tokens` / `output_tokens`	Optional token counts
`sdk_version`	Set automatically by the SDK

After ingest, the decision status progresses: queued → evaluated.

Ingest flow

Your code calls observe() or POST /v1/ingest
API validates the API key and workspace
Decision is persisted and queued for evaluation
SDK returns immediately (async delivery)

See Ingest API for the REST contract.

Evaluation flow

Warmup — engine waits until the agent has enough decisions (EVAL_WARMUP_MIN_DECISIONS)
Batch evaluation — decisions scored in batches by the judge model
Aggregation — per-agent dimension averages and overall grade
Incidents — low scores or streaks trigger inbox items

Evaluation uses your agent profile (role, description) as context so scores reflect what “good” means for that agent.

Sub-agents and tracing

Multi-step agents can use tracing to send intermediate steps under one session. Each step becomes a decision; the trace can mark a final output and resolution status.

with h.trace(agent_id="research-bot", session_id="run-42") as trace:
    draft = llm(messages)
    trace.step("draft", input=messages, output=draft)
    final = llm(draft + review_prompt)
    trace.complete(final_output=final, resolved=True)

See Tracing.

Grades

Overall scores map to letter grades in the UI:

Grade	Overall score
A	≥ 8.5
B	≥ 7.0
C	≥ 5.5
D	≥ 4.0
F	< 4.0

​What is a decision?

​Ingest flow

​Evaluation flow

​Sub-agents and tracing

​Grades

​Related

What is a decision?

Ingest flow

Evaluation flow

Sub-agents and tracing

Grades

Related