aide review

Daily observability digest. Read-only — never mutates bandit state, events log, or anything else.

Answers the daily-driver question:

Are any agents repeating the same kind of failure across multiple dispatches, and is the MEDS penalty machinery actually penalizing those patterns?

Usage

# Default: last 7 days, top 5
aide review

# Look back further, show more rows
aide review --days 30 --top 10

# Machine-readable output
aide review --format json
FlagDefaultDescription
--days N7Only consider events from the last N days
--top K5Show top K recurring-failure clusters and top K most-penalized arms
--formattexttext or json

What it reads

  • ~/.aide/events.jsonl — dispatch lifecycle events
  • ~/.aide/bandit.json — per-agent LinUCB state (arm pull counts)
  • ~/.aide/failure_patterns.json (via sidecar::failure_penalty) — MEDS centroids

It calls sidecar::cluster_failures() once to re-cluster failures inside the window. That step depends on local ollama + nomic-embed-text. When ollama is unavailable, the recurring-failures section is empty and the rest of the report still renders (you still see the failure count summary).

Output sections

RECURRING FAILURES

Clusters of failures with cluster_size >= 2. Sorted by cluster size descending, capped at --top.

ColumnMeaning
agentAgent that failed (clusters are per-agent — same task on a different agent is a separate cluster)
categoryDominant 12-class task category from the cluster's representative task
cluster_sizeNumber of failures grouped under this cluster (cosine similarity > 0.75 in embedding space)
exampleRepresentative task snippet for the cluster
first_seen / last_seenRelative time of the earliest / latest matching failed event for that agent in the window

PENALTY APPLIED

For each recurring cluster: the current MEDS penalty for the representative task and that agent.

ColumnMeaning
penaltySigned value: -failure_penalty(example, agent). More negative = bigger drag on bandit score
penalty_applied?Bucket label: heavy, light, or none (review!)
bandit_reward_trendThe 3 most recent reward values for that (agent, category) bucket inside the window, plus the delta r3 - r1

Penalty thresholds

failure_penalty(task, agent) returns a non-negative value in [0, 1]. We classify the raw value:

RawBucketMeaning
>= 0.30heavyThe MEDS centroid for past failures of this kind is strongly biting
>= 0.05lightSome penalty but not material
< 0.05none (review!)The pattern is recurring but the bandit isn't down-weighting it — concerning, investigate

Displayed as a signed value in the penalty column (-0.42 etc.) for visual consistency with reward deltas.

TOP-PENALIZED

Across all recurring clusters, sorted by current_penalty ascending (most negative first). Includes reward_arm_n, the bandit arm's pull count for that agent — high n with heavy penalty means the bandit has converged to "this agent should not get this category".

SUMMARY

Headline counts for the window: total finished events, failed events + success rate, recurring patterns, recurring patterns with no measurable penalty (the most actionable line), and total tokens spent.

Daily cron

Wire up via your scheduler of choice:

0 9 * * * aide review > ~/aide-review-$(date +\%F).txt

TODO: this can also be wired into one of the [trigger] on = "cron:..." agents (see aide list and the Triggers guide) so an agent reads the digest and emails it. Not in this PR — keeps aide review purely read-only.

Constraints

  • Pure read. No mutations to ~/.aide/bandit.json, ~/.aide/events.jsonl, ~/.aide/failure_patterns.json, or anything else.
  • Window cut-off uses event timestamps (ts field, RFC3339).
  • --format json mirrors the text fields under a strict schema; safe to pipe into jq or downstream tooling.

See also