Health & Dependencies

aide continuously monitors two things for every running agent:

  1. Liveness — the daemon writes a heartbeat file every tick.
  2. Declared dependencies — external services your agent needs (HTTP APIs, TCP ports, processes, files, shell commands).

Run aide ps to see the combined view.

Heartbeats

When the daemon is up, each scanned agent gets a record at ~/.aide/heartbeats/<agent>.json:

{
  "pid": 1234,
  "last_alive": "2026-04-18T12:00:00Z",
  "status": "running",
  "cycle": 42,
  "deps": [{"name": "database", "ok": true}]
}

aide ps flags an agent as DEAD if last_alive is older than 10 minutes — useful for catching stuck or crashed daemons.

Declaring dependencies

Add a [health] section to your Aidefile. Every field is optional; omitting [health] is equivalent to interval = "10m" with no dependencies.

[health]
interval = "10m"                       # how often to probe (default 10m)
alert    = "email:[email protected]"     # optional — alert wiring TBD

Then add one [[health.dependencies]] block per check.

http — HTTP GET

[[health.dependencies]]
name = "api_session"
type = "http"
url = "https://api.example.com/health"
expect_status = 200            # optional; otherwise any 2xx passes
# expect_json = '"ok": true'   # optional substring match on response body

tcp — port reachability

[[health.dependencies]]
name = "database"
type = "tcp"
host = "localhost"
port = 5432

command — shell exit code

Runs through sh -c; exit code 0 = healthy.

[[health.dependencies]]
name = "gpu_server"
type = "command"
run = "ssh gpu-server echo ok"
timeout = 10                   # seconds; default 10

process — pgrep pattern

[[health.dependencies]]
name = "claude_proc"
type = "process"
pattern = "claude"             # passed to `pgrep -f`

file_age — recent file modification

Useful for "is something writing logs?" or "did the model checkpoint update today?"

[[health.dependencies]]
name = "training_log"
type = "file_age"
path = "~/.aide/logs/agent.log"
max_age = "1h"                 # passes if mtime within last hour

Inspecting status

$ aide ps
NAME                 STATUS     LAST ALIVE             CYCLE    DEPS
────────────────────────────────────────────────────────────────────────────────
reviewer             running    12s ago                204      2/2 ok
gpu-trainer          running    8s ago                 11       1/2 fail:gpu_server
crawler              DEAD       42m ago                88       —

Alerts

health.alert is reserved for future use — the daemon currently logs failures via tracing::warn! and persists them in the heartbeat file's deps array. Wiring email/webhook sinks is tracked in issue #102 follow-ups.