gromitor's wedge is simple: the value of cadvisor + heapster + prometheus + grafana, with none of the deployment. here's exactly how each piece works — and why it stays lightweight.
the gromitor agent is a single small process. on docker it mounts the docker socket read-only; on kubernetes it runs as a daemonset, one pod per node, reading the kubelet's stats api. either way it self-registers with your account on first boot using your ingest token — no config files, no service to expose.
it reads only resource counters — cpu %, memory usage, network i/o, disk i/o — never the contents of your containers. metrics are batched and pushed over https every few seconds, so host overhead stays under ~2% cpu and there's nothing inbound to firewall.
docker run -d --name gromitor-agent \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-e GROMITOR_TOKEN=grm_•••• \
ghcr.io/ogbuilds/gromitor-agent:1.4.2
# → agent registered · 8 containers streamingthis is the part open-source tools make hard. with gromitor there's no metrics pipeline to design, no time-series database to run, no dashboard to build. you copy a single docker run line or kubectl apply, and within seconds your containers appear on the dashboard.
because ingestion, storage, dashboards, and alerting are all hosted, there's nothing to upgrade or patch on your side. the agent updates independently; the platform is always current.
the dashboard lists every monitored container with its live cpu % and memory usage, refreshing on a tight loop. each metric carries a sparkline of the recent 15–30 minutes, so you read the trend at a glance instead of guessing from a single value that resets.
click any container for a granular detail view: ring gauges for cpu and memory, area charts over the recent window, and current network and disk i/o — plus where the container runs and which agent reports it.
set a plain rule — a metric, a condition, a value, and a duration like "cpu over 80% for 5 minutes" — scoped to one container or across everything. gromitor evaluates it continuously against the live stream.
when a rule breaches you get an in-app notification and an email; when it recovers you get a matching resolved notice, so a noisy hour doesn't bury the one alert that mattered. it's the alerting you'd otherwise wire up by hand, already done.