aide review

Daily observability digest. Read-only — never mutates bandit state, events log, or anything else.

Answers the daily-driver question:

Are any agents repeating the same kind of failure across multiple dispatches, and is the MEDS penalty machinery actually penalizing those patterns?

Usage

# Default: last 7 days, top 5
aide review

# Look back further, show more rows
aide review --days 30 --top 10

# Machine-readable output
aide review --format json

Flag	Default	Description
`--days N`	7	Only consider events from the last N days
`--top K`	5	Show top K recurring-failure clusters and top K most-penalized arms
`--format`	`text`	`text` or `json`

What it reads

~/.aide/events.jsonl — dispatch lifecycle events
~/.aide/bandit.json — per-agent LinUCB state (arm pull counts)
~/.aide/failure_patterns.json (via sidecar::failure_penalty) — MEDS centroids

It calls sidecar::cluster_failures() once to re-cluster failures inside the window. That step depends on local ollama + nomic-embed-text. When ollama is unavailable, the recurring-failures section is empty and the rest of the report still renders (you still see the failure count summary).

Output sections

`RECURRING FAILURES`

Clusters of failures with cluster_size >= 2. Sorted by cluster size descending, capped at --top.

Column	Meaning
`agent`	Agent that failed (clusters are per-agent — same task on a different agent is a separate cluster)
`category`	Dominant 12-class task category from the cluster's representative task
`cluster_size`	Number of failures grouped under this cluster (cosine similarity > 0.75 in embedding space)
`example`	Representative task snippet for the cluster
`first_seen` / `last_seen`	Relative time of the earliest / latest matching failed event for that agent in the window

`PENALTY APPLIED`

For each recurring cluster: the current MEDS penalty for the representative task and that agent.

Column	Meaning
`penalty`	Signed value: `-failure_penalty(example, agent)`. More negative = bigger drag on bandit score
`penalty_applied?`	Bucket label: `heavy`, `light`, or `none (review!)`
`bandit_reward_trend`	The 3 most recent reward values for that `(agent, category)` bucket inside the window, plus the delta `r3 - r1`

Penalty thresholds

failure_penalty(task, agent) returns a non-negative value in [0, 1]. We classify the raw value:

Raw	Bucket	Meaning
`>= 0.30`	`heavy`	The MEDS centroid for past failures of this kind is strongly biting
`>= 0.05`	`light`	Some penalty but not material
`< 0.05`	`none (review!)`	The pattern is recurring but the bandit isn't down-weighting it — concerning, investigate

Displayed as a signed value in the penalty column (-0.42 etc.) for visual consistency with reward deltas.

`TOP-PENALIZED`

Across all recurring clusters, sorted by current_penalty ascending (most negative first). Includes reward_arm_n, the bandit arm's pull count for that agent — high n with heavy penalty means the bandit has converged to "this agent should not get this category".

`SUMMARY`

Headline counts for the window: total finished events, failed events + success rate, recurring patterns, recurring patterns with no measurable penalty (the most actionable line), and total tokens spent.

Daily cron

Wire up via your scheduler of choice:

0 9 * * * aide review > ~/aide-review-$(date +\%F).txt

TODO: this can also be wired into one of the [trigger] on = "cron:..." agents (see aide list and the Triggers guide) so an agent reads the digest and emails it. Not in this PR — keeps aide review purely read-only.

Constraints

Pure read. No mutations to ~/.aide/bandit.json, ~/.aide/events.jsonl, ~/.aide/failure_patterns.json, or anything else.
Window cut-off uses event timestamps (ts field, RFC3339).
--format json mirrors the text fields under a strict schema; safe to pipe into jq or downstream tooling.

aide.sh