Where to find it
In the dashboard, open your app and click Health in the sidebar. The page polls every 10 seconds while open and reads directly from the agent’s metrics SQLite file — no extra round-trip to PostgreSQL.The three signals that matter
Ingest rate
Payloads accepted per second over the last minute. On a single instance a healthy agent ingests up to ~13,400 payloads/s.
Drain rate
Rows written to PostgreSQL per second. Drain should keep pace with ingest; a persistent gap means the buffer is filling.
Buffer depth
Pending rows sitting in the SQLite WAL buffer. Grows when drain lags ingest; the agent rejects new payloads once it hits 100,000.
Reading the charts
- Ingest vs. drain — the two lines should track each other. If drain flattens while ingest keeps climbing, PostgreSQL is the bottleneck (check the
postgresql-sizingguide). - Buffer depth — a steady sawtooth is normal: ingest fills, drain empties in 5,000-row batches. A monotonically rising line is the early warning for back-pressure.
- Drain batch latency — time to
COPYone batch. Sustained values above a second usually mean PostgreSQL is IO-bound or connection-starved.
Status banners
Agent offline
Agent offline
The health file hasn’t been touched in the last 30 seconds. The agent process likely crashed or was never started. Check
supervisord / systemd logs for the nightowl:agent worker.Buffer near capacity
Buffer near capacity
Buffer depth is above 80,000 rows. Ingest will start rejecting payloads at 100,000. Scale drain workers (see
NIGHTOWL_DRAIN_WORKERS) or add a PostgreSQL replica.Drain stalled
Drain stalled
Drain rate has been zero for longer than one minute while ingest is non-zero. Almost always a PostgreSQL connection issue — verify credentials, PgBouncer, and
max_connections.No recent payloads
No recent payloads
Ingest rate has been zero for 5+ minutes. Your application may not be sending. Confirm
NIGHTWATCH_TOKEN is set to the token from the NightOwl dashboard and that the Nightwatch package is installed and booted.What to do when it’s unhealthy
- Persistent drain lag → increase drain workers:
NIGHTOWL_DRAIN_WORKERS=4(one worker per PostgreSQL core is a reasonable ceiling). - Connection churn → put PgBouncer in front of PostgreSQL. The bundled
docker-compose.ymlships a ready configuration on port 6432. - Single-instance ceiling → run multiple agent instances behind
SO_REUSEPORT. See Multiple Instances. - PostgreSQL saturation → check
pg_stat_activity, tunesynchronous_commit = off, or upgrade disk. See PostgreSQL sizing.
Raw metrics for external monitoring
The agent also writes a JSON metrics file atstorage/nightowl/agent-buffer.sqlite.drain-metrics.json. Point Prometheus, Datadog, or a cron-scraped shell script at it if you want alerts that live outside the dashboard.