🛡️

AI Agent Safety Dashboard

Control Panel
🔔0
📊 Dashboard
🔴 Live Sessions
📋 Agent Registry
📝 Audit Log
⚙️ Settings

⚡ Intervention — Alert

Alert Context

Signal
-
Agent
-
ARN
-
Session
-
Tokens
-
Duration
-

⚡ Session Intervention — agent

Session Context

Agent
-
ARN
-
Session
-
Status
-
Tokens
-
Duration
-
⚙️ Safety Thresholds
Configure alert thresholds for cost and evaluation signals. Changes apply on next sync. Observability thresholds are controlled by CloudWatch alarm configurations.

💰 Cost Thresholds

ℹ️ How it works
Each agent gets an AWS Budget that tracks monthly spend. The Default Budget Amount sets the dollar limit — changing it updates all existing budgets immediately. Warning and Critical % control when the dashboard shows alert badges based on how much of the budget is used.
Default Budget Amount ($)Monthly budget for all agents (updates AWS Budgets on save)
Budget Warning %Triggers warning when budget usage exceeds this
Budget Critical %Triggers critical when budget usage exceeds this

🧪 Evaluation Thresholds

Configure when the evaluation alarm fires for each agent. One CloudWatch alarm is created per agent that monitors bad responses across 7 built-in evaluators.
ℹ️ How it works
Each agent has one CloudWatch alarm that monitors bad responses across 7 built-in evaluators. The alarm sums bad scores from 5 of them (Harmfulness, Correctness, Goal Success, Tool Selection, Tool Parameters) — CloudWatch checks every 15 minutes by aggregating bad counts over that window. The alarm fires when the total bad count reaches the lowest non-zero threshold you set below. For example: Harmfulness = 1 and Correctness = 3 means the alarm fires at ≥ 1 total bad. Set a value to 0 to exclude that evaluator from the alarm. The remaining 2 evaluators (Helpfulness and Faithfulness) are tracked and visible in the dashboard evaluator scores but don't contribute to the CloudWatch alarm. Saving updates all agent alarms immediately.
Per-Evaluator Bad Count Thresholds
Maximum bad responses allowed before the alarm fires. Set to 0 to exclude from alarm. Minimum active value: 1.
HarmfulnessMax harmful responses. Most sensitive — even 1 harmful response is typically critical. Min: 1
CorrectnessMax incorrect or partially correct responses. Counts both "Incorrect" and "Partially Correct" labels. Min: 1
Goal Success RateMax goal failures (agent failed to achieve the user's goal). Min: 1
HelpfulnessMax unhelpful responses. Dashboard display only — does not affect CloudWatch alarm
FaithfulnessMax unfaithful responses (hallucinations). Dashboard display only — does not affect CloudWatch alarm
Tool Selection AccuracyMax times agent picked the wrong tool for the task. Min: 1
Tool Parameter AccuracyMax times agent passed wrong parameters to a tool. Min: 1
Dashboard Display Thresholds
Controls severity badges in dashboard tables. Uses bad percentage (bad ÷ total × 100), not raw counts. These do not affect CloudWatch alarms.
Dashboard Warning %Show ⚠️ warning badge when bad % exceeds this value (default: 20%)
Dashboard Critical %Show 🔴 critical badge when bad % exceeds this value (default: 50%)

📡 Observability Thresholds

ℹ️ How it works
Each agent gets 4 static threshold alarms (latency, errors, tokens, invocations). When a metric exceeds its threshold in Datapoints to Alarm out of Evaluation Periods 5-minute windows, the alarm fires. Saving updates all agent alarms immediately.
Per-Metric Thresholds
Latency (ms)Alarm when average response time exceeds this (default: 10000ms = 10s)
Error CountAlarm when errors per period exceed this (default: 5)
Token UsageAlarm when tokens per period exceed this (default: 100000)
Invocation CountAlarm when invocations per period exceed this (default: 200)
Evaluation Window
Evaluation PeriodsNumber of 5-min periods to evaluate (default: 3)
Datapoints to AlarmHow many periods must breach before alarm fires (default: 2). Must be ≤ Evaluation Periods
● Connecting...