Restart Policy

Some agents are long-running daemons rather than one-shot tasks. When such an agent crashes (non-zero exit, panic, or runner error), the aide daemon's supervisor can auto-restart it according to a [daemon] block in the Aidefile.

Aidefile schema

[daemon]
restart = "on-failure"      # always | on-failure | never
max_restarts = 10           # supervisor gives up after this many consecutive restarts
backoff = "exponential"     # exponential | linear | fixed

All fields are optional. If the [daemon] block is omitted entirely, agents keep their existing single-shot behaviour (restart = "never") — backward compatible with all pre-existing Aidefiles.

Restart modes

ModeWhen the supervisor restarts
alwaysAfter every run — both success and failure. Use for true daemons.
on-failureOnly when the run failed (non-zero exit / runner Err / budget exhausted). Default when [daemon] is present.
neverNever restart. Default for legacy Aidefiles without [daemon].

max_restarts

A safety net against infinite crash loops. The supervisor counts consecutive restart attempts per agent and stops once max_restarts is reached. The counter resets to zero after any successful run, so a flaky agent that eventually succeeds gets a fresh budget.

Backoff strategies

The supervisor sleeps between restarts to avoid hot-looping. Sleep duration grows with the attempt number (0-indexed):

StrategySleep sequence (seconds)
exponential1, 2, 4, 8, 16, 32, 64, 128, 256, 300, 300, …
linear1, 2, 3, 4, 5, …
fixed1, 1, 1, 1, …

All strategies are capped at 300 seconds (5 minutes) per individual sleep, so even pathological attempt counts stay bounded.

Example: monitoring agent that should never stop

[persona]
name = "uptime-monitor"

[trigger]
on = "cron:*/5 * * * *"     # every 5 minutes

[daemon]
restart = "always"          # respawn on success and failure
max_restarts = 100
backoff = "exponential"

Use cases

  • Monitoring agents (restart = "always") that should never stop
  • Stream processors that crash intermittently (restart = "on-failure")
  • Any agent with [trigger] on = "cron:..." that you want kept alive without an external watchdog

Notes

  • The restart counter is in-memory only — it lives inside the running daemon process and is reset on aide up / aide down. This is intentional: persistent state for crash budgets is overkill for v2.
  • Restart policy is independent of trigger type. The cron trigger schedules the first run; the supervisor then decides whether to keep it alive.