aide ab
A/B comparison harness for routing strategies (ICSE issue #95).
Compares two arms over finished events in ~/.aide/events.jsonl:
- Treatment:
bandit— LinUCB + MEDS failure penalty (set whenaide dispatch --auto). - Control:
round-robin— flat cycling over candidates.
Each dispatched event carries an optional routing field; the matching finished event copies it via issue lookup.
Subcommands
aide ab analyze [--events PATH]
Partition finished events by arm and print a per-metric table plus deltas. Warns if either arm has fewer than 5 events; target is 25+ per arm.
ARM N SUCCESS_RATE AVG_TOKENS AVG_TIME_MS AVG_COMPRESSION AVG_REWARD
──────────────────────────────────────────────────────────────────────────────────────
round-robin 50 0.620 33102.0 40221.0 0.0146 0.730
bandit 50 0.840 18011.0 28019.0 0.0140 0.851
deltas (bandit − round-robin; rel = (b-r)/|r|):
success_rate abs= +0.2200 rel= +35.48%
avg_tokens abs= -15091.0000 rel= -45.59%
...
Default events path: ~/.aide/events.jsonl.
aide ab export [--events PATH] [--out PATH.csv]
Emit a CSV with one row per finished event. Columns:
timestamp,arm,agent,task_category,success,cloud_tokens,local_cpu_ms,compression_ratio,reward,duration_ms
Default output: ./ab-export.csv. task_category is keyword-inferred from the issue + task snippet.
aide ab simulate --n N [--seed N]
Generate N synthetic dispatched+finished event pairs per arm, write to a fresh tempfile (NEVER touches production), and run analyze on it. Useful for validating the analysis pipeline before real experiment data lands.
aide ab simulate --n 50 --seed 42
The simulator is hardcoded so bandit beats round-robin in success rate and tokens, matching the hypothesis under test.
Statistical approach
The harness reports raw absolute and relative deltas (no t-test or bootstrap CI). Rationale: with the target n=25 per arm, classical NHST is underpowered for small effect sizes anyway; deltas + per-event CSV export let downstream notebooks (R / pandas) run whatever test the paper demands.
See also
- aide policy-update — how the bandit learns
- aide dispatch —
--autoflag tags events asrouting=bandit