Health & Dependencies
aide continuously monitors two things for every running agent:
- Liveness — the daemon writes a heartbeat file every tick.
- Declared dependencies — external services your agent needs (HTTP APIs, TCP ports, processes, files, shell commands).
Run aide ps to see the combined view.
Heartbeats
When the daemon is up, each scanned agent gets a record at
~/.aide/heartbeats/<agent>.json:
{
"pid": 1234,
"last_alive": "2026-04-18T12:00:00Z",
"status": "running",
"cycle": 42,
"deps": [{"name": "database", "ok": true}]
}
aide ps flags an agent as DEAD if last_alive is older than 10 minutes —
useful for catching stuck or crashed daemons.
Declaring dependencies
Add a [health] section to your Aidefile. Every field is optional;
omitting [health] is equivalent to interval = "10m" with no dependencies.
[health]
interval = "10m" # how often to probe (default 10m)
alert = "email:[email protected]" # optional — alert wiring TBD
Then add one [[health.dependencies]] block per check.
http — HTTP GET
[[health.dependencies]]
name = "api_session"
type = "http"
url = "https://api.example.com/health"
expect_status = 200 # optional; otherwise any 2xx passes
# expect_json = '"ok": true' # optional substring match on response body
tcp — port reachability
[[health.dependencies]]
name = "database"
type = "tcp"
host = "localhost"
port = 5432
command — shell exit code
Runs through sh -c; exit code 0 = healthy.
[[health.dependencies]]
name = "gpu_server"
type = "command"
run = "ssh gpu-server echo ok"
timeout = 10 # seconds; default 10
process — pgrep pattern
[[health.dependencies]]
name = "claude_proc"
type = "process"
pattern = "claude" # passed to `pgrep -f`
file_age — recent file modification
Useful for "is something writing logs?" or "did the model checkpoint update today?"
[[health.dependencies]]
name = "training_log"
type = "file_age"
path = "~/.aide/logs/agent.log"
max_age = "1h" # passes if mtime within last hour
Inspecting status
$ aide ps
NAME STATUS LAST ALIVE CYCLE DEPS
────────────────────────────────────────────────────────────────────────────────
reviewer running 12s ago 204 2/2 ok
gpu-trainer running 8s ago 11 1/2 fail:gpu_server
crawler DEAD 42m ago 88 —
Alerts
health.alert is reserved for future use — the daemon currently logs failures
via tracing::warn! and persists them in the heartbeat file's deps array.
Wiring email/webhook sinks is tracked in issue #102 follow-ups.