should i set cpu alerts as percentage of host or percentage of container limit?

percentage of container limit is more meaningful for catching misconfigured workloads — a container at 100% of its cpu limit is throttled regardless of what the host is doing. percentage of host is useful for capacity planning — knowing that your containers are collectively using 80% of host cpu helps with provisioning decisions. gromitor shows both views.

how do i avoid waking up on call for a 30-second cpu spike during a cron job?

use a sustained duration on your alerts. gromitor's sustained duration setting requires the threshold to be exceeded for a rolling window (e.g. 5 minutes) before the alert fires. a 30-second spike won't trigger a 5-minute sustained alert, but a runaway process that has been at 95% cpu for 10 minutes will.

can i alert on cpu throttling specifically, not just cpu usage?

not currently. gromitor alerts on cpu utilization percentage. cpu throttling (where the container is being rate-limited by the kernel because it exceeded its cpu quota) is a related but distinct metric that requires cgroup-level data beyond what the Docker stats API exposes at this level of abstraction.

what if i have a container whose memory usage legitimately grows over time?

some containers (JVM applications with generational GC, for example) have memory that grows until GC kicks in and drops it back down — a sawtooth pattern. for these, alert on memory trend rather than absolute value: if memory has been growing for 30+ minutes without a GC drop, that's worth an alert. gromitor's trend-based alerting handles this case better than a simple threshold.

setting up cpu and memory alerts for containerized applications

how-to9 june 2026· 5 min read

setting up cpu and memory alerts for containerized applications means choosing thresholds that catch real problems early without generating noise from normal operational variation. with gromitor, you set per-container thresholds in the dashboard — no alertmanager yaml, no PagerDuty routing rules — and get notified by email or in-app when a container breaches them. the key is picking the right thresholds, which depends on whether your container's resource usage is expected to be spiky or steady.

why container alerting is different from vm alerting

virtual machines tend to have relatively stable resource baselines. containers are designed to be ephemeral, dense, and variable. a container running a node.js API might idle at 0.5% cpu and spike to 80% during a request burst, then return to baseline — all in a few seconds. a container running a daily batch job will be at 0% for 23 hours and 100% for one hour. alerting strategies that work for VMs (alert at 70% cpu) produce constant noise in container environments.

memory is especially important for containers because it has a hard limit. when a container exceeds its memory limit, the kernel OOM-kills it immediately. there's no swap, no graceful degradation. an alert at 80% memory is your early warning to act before the OOM killer does.

threshold strategy by workload type

the right thresholds depend on what the container does. for stateless web services, alert on sustained cpu above 70–80% for 5+ minutes (short spikes are normal) and memory above 80% of limit at any point. for stateful workloads like databases and caches, alert on cpu above 60% sustained (these workloads shouldn't be compute-bound) and memory above 70% of limit (memory growth in databases often precedes crashes).

for batch jobs and workers, cpu alerting is often not useful — high cpu is expected and normal. focus alerting on memory (does it return to baseline after the job?) and job completion signals at the application level.

stateless APIs: cpu alert at 70–80% sustained 5m, memory at 80% of limit
databases and caches: cpu alert at 60% sustained, memory at 70% of limit
batch workers: skip cpu alerts; alert on memory baseline not returning post-job
sidecar containers (log shippers, proxies): alert on any sustained cpu above 20% — sidecars shouldn't be noisy
queue consumers: alert on memory growth over time (consumer not draining the queue)

setting alerts in gromitor

in the gromitor dashboard, navigate to any container and click the alerts tab. you set a metric (cpu or memory), a threshold value (as a percentage or absolute bytes), a sustained duration (how long the threshold must be exceeded before the alert fires), and a delivery method (in-app, email, or both). the sustained duration is important — it's what separates a real alert from a transient spike.

alerts are per-container by default, but you can also set them by container name pattern. if you have ten containers named `worker-1` through `worker-10`, you can create one alert rule that covers all of them. when any worker breaches the threshold, the alert tells you which specific container triggered it.

tuning thresholds to reduce noise

the biggest alerting mistake is setting thresholds on day one and never revisiting them. spend the first two weeks watching your containers' actual behavior in the gromitor dashboard without any alerts set. note the typical range for cpu and memory during peak and off-peak hours. set your initial thresholds well above that typical range — say, 2x the peak baseline — and adjust downward as you get a feel for what's anomalous vs. normal.

alert fatigue is a real operational risk. an alert that fires too often becomes background noise. the gromitor dashboard helps with this because you can look at the historical trend for any container and see whether a threshold would have produced too many false positives over the past 24 hours.

integrating alerts into your workflow

in-app alerts are useful when you're actively watching the gromitor dashboard. email alerts are better for off-hours coverage — you get a notification in your inbox when something crosses a threshold overnight, and you can assess severity in the morning. for teams that need immediate paging, gromitor's roadmap includes webhook delivery which enables integration with PagerDuty, OpsGenie, and Slack.

for related guidance, the container memory alerts across multiple cloud environments article covers multi-cloud alert configuration, and the how to monitor docker cpu usage in real-time article covers the cpu monitoring fundamentals in more depth.

see this on your own containers

gromitor gives you real-time docker + kubernetes monitoring from one lightweight agent — no open-source tools to deploy.

get started, it's free how gromitor works

faq

should i set cpu alerts as percentage of host or percentage of container limit?: percentage of container limit is more meaningful for catching misconfigured workloads — a container at 100% of its cpu limit is throttled regardless of what the host is doing. percentage of host is useful for capacity planning — knowing that your containers are collectively using 80% of host cpu helps with provisioning decisions. gromitor shows both views.
how do i avoid waking up on call for a 30-second cpu spike during a cron job?: use a sustained duration on your alerts. gromitor's sustained duration setting requires the threshold to be exceeded for a rolling window (e.g. 5 minutes) before the alert fires. a 30-second spike won't trigger a 5-minute sustained alert, but a runaway process that has been at 95% cpu for 10 minutes will.
can i alert on cpu throttling specifically, not just cpu usage?: not currently. gromitor alerts on cpu utilization percentage. cpu throttling (where the container is being rate-limited by the kernel because it exceeded its cpu quota) is a related but distinct metric that requires cgroup-level data beyond what the Docker stats API exposes at this level of abstraction.
what if i have a container whose memory usage legitimately grows over time?: some containers (JVM applications with generational GC, for example) have memory that grows until GC kicks in and drops it back down — a sawtooth pattern. for these, alert on memory trend rather than absolute value: if memory has been growing for 30+ minutes without a GC drop, that's worth an alert. gromitor's trend-based alerting handles this case better than a simple threshold.

keep reading

← all guides try gromitor