Reports give you structured summaries of agent judgment over time — designed for review meetings, release sign-offs, and compliance checkpoints.

What’s in a report

Reports typically aggregate:
  • Overall judgment score and grade trend
  • Per-dimension averages and changes
  • Incident count and safe-completion indicators
  • Decision volume over the period
Exact report cadence and sections depend on your plan and evaluation engine settings (EVAL_REPORT_EVERY on the API side).

When reports generate

Reports are produced by the evaluation pipeline after enough data accumulates. New agents may not have reports until warmup and batch evaluation complete.

How to use reports

1

Baseline

Export or review the first report after stable traffic — this is your baseline.
2

Compare releases

After prompt/model changes, compare dimension deltas in the next report.
3

Share

Combine with Public profiles for external stakeholders.