aide review
Daily observability digest. Read-only — never mutates bandit state, events log, or anything else.
Answers the daily-driver question:
Are any agents repeating the same kind of failure across multiple dispatches, and is the MEDS penalty machinery actually penalizing those patterns?
Usage
# Default: last 7 days, top 5
aide review
# Look back further, show more rows
aide review --days 30 --top 10
# Machine-readable output
aide review --format json
| Flag | Default | Description |
|---|---|---|
--days N | 7 | Only consider events from the last N days |
--top K | 5 | Show top K recurring-failure clusters and top K most-penalized arms |
--format | text | text or json |
What it reads
~/.aide/events.jsonl— dispatch lifecycle events~/.aide/bandit.json— per-agent LinUCB state (arm pull counts)~/.aide/failure_patterns.json(viasidecar::failure_penalty) — MEDS centroids
It calls sidecar::cluster_failures() once to re-cluster failures inside the window. That step depends on local ollama + nomic-embed-text. When ollama is unavailable, the recurring-failures section is empty and the rest of the report still renders (you still see the failure count summary).
Output sections
RECURRING FAILURES
Clusters of failures with cluster_size >= 2. Sorted by cluster size descending, capped at --top.
| Column | Meaning |
|---|---|
agent | Agent that failed (clusters are per-agent — same task on a different agent is a separate cluster) |
category | Dominant 12-class task category from the cluster's representative task |
cluster_size | Number of failures grouped under this cluster (cosine similarity > 0.75 in embedding space) |
example | Representative task snippet for the cluster |
first_seen / last_seen | Relative time of the earliest / latest matching failed event for that agent in the window |
PENALTY APPLIED
For each recurring cluster: the current MEDS penalty for the representative task and that agent.
| Column | Meaning |
|---|---|
penalty | Signed value: -failure_penalty(example, agent). More negative = bigger drag on bandit score |
penalty_applied? | Bucket label: heavy, light, or none (review!) |
bandit_reward_trend | The 3 most recent reward values for that (agent, category) bucket inside the window, plus the delta r3 - r1 |
Penalty thresholds
failure_penalty(task, agent) returns a non-negative value in [0, 1]. We classify the raw value:
| Raw | Bucket | Meaning |
|---|---|---|
>= 0.30 | heavy | The MEDS centroid for past failures of this kind is strongly biting |
>= 0.05 | light | Some penalty but not material |
< 0.05 | none (review!) | The pattern is recurring but the bandit isn't down-weighting it — concerning, investigate |
Displayed as a signed value in the penalty column (-0.42 etc.) for visual consistency with reward deltas.
TOP-PENALIZED
Across all recurring clusters, sorted by current_penalty ascending (most negative first). Includes reward_arm_n, the bandit arm's pull count for that agent — high n with heavy penalty means the bandit has converged to "this agent should not get this category".
SUMMARY
Headline counts for the window: total finished events, failed events + success rate, recurring patterns, recurring patterns with no measurable penalty (the most actionable line), and total tokens spent.
Daily cron
Wire up via your scheduler of choice:
0 9 * * * aide review > ~/aide-review-$(date +\%F).txt
TODO: this can also be wired into one of the
[trigger] on = "cron:..."agents (seeaide listand the Triggers guide) so an agent reads the digest and emails it. Not in this PR — keepsaide reviewpurely read-only.
Constraints
- Pure read. No mutations to
~/.aide/bandit.json,~/.aide/events.jsonl,~/.aide/failure_patterns.json, or anything else. - Window cut-off uses event timestamps (
tsfield, RFC3339). --format jsonmirrors the text fields under a strict schema; safe to pipe intojqor downstream tooling.
See also
- aide policy-update — runs the MEDS clustering that produces the centroids
aide reviewchecks - aide ab analyze — A/B comparison of bandit vs round-robin routing
- Smart Routing guide — how the bandit and penalty work together