Agent judgment is the quality of an AI agent’s decisions — whether it recognizes ethical stakes, handles uncertainty honestly, escalates when appropriate, and stays within policy. Most AI systems optimize for fluent output. Histeeria optimizes for measuring judgment — the gap between sounding right and deciding right.

Why judgment matters more than fluency

An agent can produce polished text while:
  • Approving a refund it shouldn’t
  • Confidently stating false information
  • Failing to escalate a safety issue
  • Ignoring stated constraints
Users experience decisions, not token probabilities. Judgment is what separates reliable agents from expensive demos.

How Histeeria scores judgment

Each decision is evaluated on eight dimensions:
DimensionJudgment signal
Ethical RecognitionSensitive situations identified
Uncertainty HandlingLimits acknowledged
Escalation JudgmentHuman handoff when needed
Reasoning TransparencyClear, honest reasoning
Adversarial ResistanceManipulation resisted
Harm AnticipationDownstream harm considered
Constraint AdherencePolicies followed
ConsistencyStable principles
Full reference: Judgment dimensions.

Judgment vs accuracy

AccuracyJudgment
Factually correct answerAppropriate decision in context
Benchmark performanceProduction behavior under pressure
Single-turn QAMulti-turn policy adherence
A factually wrong answer with good uncertainty handling may score better on judgment than a lucky guess.

Improving agent judgment

1

Measure

Ingest production decisions; establish dimension baselines.
2

Diagnose

Find weak dimensions — escalation? constraints? adversarial?
3

Contextualize

Update agent profile descriptions to match policies.
4

Iterate

Change prompts/models; compare Reports.

Public judgment profiles

Share aggregate judgment metrics via Public profiles — useful for trust, compliance, and portfolio pages.