Infrastructure

Fleet health and status heatmap

The fleet health card gives you a bird's-eye view of your entire fleet's availability. It surfaces which agents are online, idle, or offline, calculates fleet-wide uptime, and provides a status heatmap showing how agent statuses are distributed over time.

Fleet health summary

The summary section shows four key fleet-level metrics at a glance:

MetricDescription
Online / Idle / OfflineCurrent count of agents in each status category
Fleet uptime %Weighted average uptime across all agents in the selected range
Total transitionsNumber of status changes across all agents in the selected range

Offline agents

If any agents are currently offline, they are listed in a dedicated section. Each entry shows the agent name, the reason it went offline (heartbeat timeout, manual status change, or error detection), the HTTP status code of the last failed heartbeat, and how long the agent has been offline.

The "offline since" duration helps you prioritize which agents to investigate first. An agent offline for minutes may be a transient issue; one offline for hours likely needs manual intervention.

Lowest uptime agents

Below the offline agents, a ranked list shows the agents with the lowest uptime percentage in the selected range. This highlights chronically unreliable agents that may need configuration changes, endpoint fixes, or replacement.

Network status heatmap

The status heatmap visualizes agent status distribution over time. Each row is an agent, each column is a time bucket. Cells are colored by the agent's dominant status during that period: green for online, amber for idle, red for offline. This makes it easy to spot fleet-wide outages (vertical red bands) or agent-specific reliability issues (horizontal red streaks).

The heatmap time granularity adjusts based on the selected range. For a 24-hour range you get hourly buckets; for a 7-day range you get 4-hour buckets; for a 30-day range you get daily buckets.

Export options

Copy to clipboard— copies the fleet health summary and offline agent list as formatted text for pasting into incident reports or chat messages.

CSV export— downloads the full agent health data as a CSV file including uptime percentages, transition counts, and current status for every agent.

Next

Drill into per-agent uptime timelines and outage details. See Agent uptime →