CORE CONCEPTS

Right-sizing

How Kubeadapt analyzes CPU and memory usage patterns to generate container rightsizing recommendations.


Overview

In Kubernetes, you pay for what you request, not what you use. Pods that request 2 CPU cores but only use 200m still reserve the full 2 cores on the node. That unused capacity cannot be scheduled to other workloads. Most teams set high requests during initial deployment and never revisit them, so a large portion of requested resources goes unused.

Right-sizing is the process of adjusting those requests and limits to match actual usage, reducing cost while keeping workloads stable.


Core Principle: Request-Centric Optimization

Kubeadapt optimizes requests as the primary cost lever. Limits play a supporting role: they protect against anomalies, not control cost.

Why Requests Matter More Than Limits

Requests determine what you pay for. Cloud providers charge for the capacity reserved by requests, whether the pod actually uses it or not. Over-provisioned requests waste money. Under-provisioned requests cause scheduling failures and evictions.

Limits exist to cap runaway processes, not to enforce cost savings. Setting limits too tight causes CPU throttling and OOM kills even when the node has idle capacity.

When requests accurately reflect usage, the gap between request and limit shrinks naturally, and aggressive limits become unnecessary.

CPU vs Memory

PropertyCPUMemory
TypeCompressibleNon-compressible
Exceeding limitThrottled (slower, not fatal)OOM killed (pod restart)
Optimization approachMore aggressive (throttling is recoverable)Conservative (OOM is disruptive)
Default percentile (production)P95P99
Default percentile (non-production)P50P50

How Right-sizing Works

Kubeadapt's right-sizing pipeline runs in three stages:

Three-stage right-sizing pipeline

Stage 1: Data Collection

The in-cluster agent samples every pod every 60 seconds and sends the following to the ingestion API:

Resource metrics:

  • CPU usage (millicores) and throttling events
  • Memory usage (bytes) and OOM kill events
  • Current requests and limits

Scheduling context:

  • Pod phase (Running, Pending, Failed)
  • Restart count
  • HPA status (if applicable)

Metrics are stored in ClickHouse for time-series analysis. The analyzer reads from ClickHouse using a configurable lookback window (default: 7 days).


Stage 2: Pattern Analysis

The analyzer examines two independent dimensions of workload behavior:

DimensionWhat it measuresOutput
Growth trendIs usage increasing day over day?Growth buffer (1.0x - 1.20x)
Usage volatilityHow spiky is usage across the lookback period?Spike buffer (1.0x - 1.35x)

Both dimensions are computed independently. A workload can have high growth with low volatility, or stable averages with frequent spikes.

Growth Trend

The analyzer fits a linear regression on daily average usage over the lookback period. The slope is normalized to a percentage of the mean to keep it scale-agnostic across CPU (millicores) and memory (bytes).

text
Daily % Change = (slope / mean_value) x 100

The percentage maps to a growth buffer via linear interpolation:

Daily % ChangeClassificationGrowth Buffer
<= 0%Stable or declining1.0x (no buffer)
0% - 0.5%Light growth1.0x - 1.10x
0.5% - 1.0%Steady growth1.10x - 1.20x
>= 1%Rapid growth1.20x (capped)

Formula: growth_buffer = 1.0 + clamp(daily_pct_change, 0, 1.0) × 0.20

Growing workloads get headroom so recommendations stay valid as usage increases. Stable workloads can be sized more tightly.

Usage Volatility

The analyzer computes the Coefficient of Variation (CV) across all data points in the lookback period:

text
CV = Standard Deviation / Mean

This is not a per-hour calculation. It uses the full set of raw data points (e.g., 7 days x 24h x 60min = ~10,000 samples) to measure overall usage predictability.

The CV maps to a spike buffer via linear interpolation:

CV ValueSpike BufferBehavior
0.01.0xPerfectly stable
0.21.07xMinor variation
0.41.14xModerate spikes
0.61.21xFrequent spikes
0.81.28xHighly unpredictable
1.0+1.35x (capped)Extremely variable

Formula: spike_buffer = 1.0 + min(cv, 1.0) x 0.35

Low-CV workloads can be sized close to the percentile. High-CV workloads need a larger buffer above the percentile to absorb unpredictable spikes.

How the Two Dimensions Differ

text
SPIKE BUFFER (CV) GROWTH BUFFER (Trend) ^ ^ | * * | o (Day 7 avg) | * * * | o | * * * * * | o |* * * * * * * | o | * * | o | * | o +-----------------------------> +-----------------------------> ~10,080 raw data points 7 daily aggregated points (stddev/avg -> CV) (linear regression -> slope)
  • Spike buffer uses all individual metrics to measure within-period predictability
  • Growth buffer aggregates to daily averages, then fits a trend line

Stage 3: Recommendation Generation

Request Formula

The core formula applies to both CPU and memory:

text
Request = Base Percentile x Growth Buffer x Spike Buffer

Where:

  • Base Percentile comes from the cluster's analysis policy (P95 CPU / P99 memory for production, P50 for non-production)
  • Growth Buffer is 1.0x - 1.20x based on daily usage trend
  • Spike Buffer is 1.0x - 1.35x based on CV

Example:

text
Workload: api-gateway (production cluster) Base values from lookback period: CPU P95: 120m Memory P99: 850Mi Growth trend: Daily % change: 0.75% Growth buffer: 1.0 + (0.75 x 0.20) = 1.15x Usage volatility: CV: 0.28 Spike buffer: 1.0 + (0.28 x 0.35) = 1.10x Final requests: CPU: 120m x 1.15 x 1.10 = 152m (rounded to 155m) Memory: 850Mi x 1.15 x 1.10 = 1076Mi (~1.1Gi)

CPU Limit Formula

text
CPU Limit = CPU Request x CPULimitBuffer

The limit buffer comes from the cluster's analysis policy (1.2x for both production and non-production profiles).

Note

If the max observed CPU exceeds the calculated limit by more than 5x, Kubeadapt sets the CPU limit to 0 (unbounded). This applies to bursty workloads like log forwarders, batch processors, and CI runners. The request is still right-sized normally, so scheduling and cost allocation stay accurate.

Memory Limit Formula

Memory limits use a different base than memory requests. OOM kills are fatal (pod restart, potential data loss), so the limit is calculated from the maximum observed memory (P100), not the percentile:

text
Memory Limit = P100 (max observed) x Growth Buffer x Spike Buffer x MemoryLimitBuffer
Warning

Memory limits are always based on P100 (max observed), not the policy percentile. The limit must cover the highest memory usage ever recorded. If P100 data is not available, the system falls back to the percentile value.

The memory limit buffer is 1.3x for both production and non-production profiles.


Environment-Based Recommendations

Kubeadapt generates different recommendations depending on the cluster's profile. The same deployment may receive different resource values in production vs. non-production.

Cluster Profile Selection

When registering a cluster in Kubeadapt, you select a profile that sets the analysis policy:

ProfileUse CaseApproach
ProductionMission-critical workloadsConservative: higher percentiles, tighter limits
Non-ProductionDev, test, stagingAggressive: lower percentiles, maximize savings
CustomUser-definedOverride thresholds per recommendation type via the UI
Note

Each recommendation type (workload right-sizing, idle workload detection, etc.) is powered by a separate analyzer. Custom profiles let you override percentiles, limit buffers, and savings thresholds independently for each analyzer.

Policy Values

FieldProductionNon-Production
cpu_percentileP95P50
memory_percentileP99P50
cpu_limit_buffer1.2x1.2x
memory_limit_buffer1.3x1.3x
min_monthly_savings_usd$4.0$5.0

Both profiles use the same limit buffers. The cost difference comes from the base percentile: P95/P99 captures nearly all usage peaks, while P50 sizes to median usage for maximum savings.

Production Recommendations

Production clusters optimize for reliable savings. Requests are based on P95 CPU / P99 memory usage plus growth and spike buffers, covering nearly all observed usage patterns. The 1.2x/1.3x limit buffers add a controlled burst margin for anomalies.

Example:

text
Workload: api-gateway (production) CPU P95: 750m, daily growth 0.8%, CV 0.3 Growth buffer: 1.16x Spike buffer: 1.105x CPU Request: 750m x 1.16 x 1.105 = 962m (rounded to 1000m) CPU Limit: 1000m x 1.2 = 1200m

The resulting QoS class is Burstable (requests < limits). See QoS and Eviction Behavior for how this affects pod priority during node pressure.

Non-Production Recommendations

Non-production clusters target aggressive cost reduction. Occasional throttling, eviction, or brief downtime is acceptable in dev/test environments. Requests are based on P50 (median) usage, which cuts the request footprint significantly since these workloads are idle most of the time.

Example:

text
Workload: api-gateway (development) CPU P50: 20m, daily growth 0.5%, CV 0.2 Growth buffer: 1.10x Spike buffer: 1.07x CPU Request: 20m x 1.10 x 1.07 = 24m CPU Limit: 24m x 1.2 = 29m Memory P50: 180Mi, P100 (max): 320Mi Memory Request: 180Mi x 1.10 x 1.07 = 212Mi Memory Limit: 320Mi x 1.10 x 1.07 x 1.3 = 490Mi

P50-based requests allow much denser node packing. Since dev workloads run intermittently and rarely all spike at once, higher overcommit ratios are safe.

Environment Comparison

AspectProductionNon-Production
Base percentileP95 (CPU), P99 (Memory)P50 (CPU and Memory)
CPU limit buffer1.2x1.2x
Memory limit buffer1.3x1.3x
Typical node overcommit110-130%300%+
QoS targetBurstable (controlled)Burstable
Risk toleranceLowHigh

Why Aggressive Limits Cause Problems

Setting limits too tight causes issues even when the node has idle capacity.

CPU throttling example:

A container with a 300m CPU limit on a 4-core node (4000m capacity) can use at most 30ms of CPU per 100ms CFS period. If it needs 50ms of compute, it waits. The node might have 3700m of idle CPU, but the container cannot access it. The result is increased request latency even though the node is underutilized.

Memory OOM kill example:

A container with a 2Gi memory limit that normally uses 1.8Gi will be killed immediately if a traffic spike pushes usage to 2.1Gi. The pod restarts, connections drop, and in-flight requests fail.

Danger

Limits do not affect scheduling or billing. Only requests determine how much capacity is reserved and paid for on the node. Setting aggressive limits does not reduce cost. It only causes throttling and OOM kills.

Kubeadapt's approach:

  1. Right-size requests (this is where cost savings come from)
  2. Set limits as anomaly protection only (1.2x CPU above request, 1.3x memory above max observed usage)
  3. For bursty workloads, remove CPU limits entirely to prevent unnecessary throttling

Special Cases

Multi-Container Pods

Pods with sidecars (logging, metrics, service mesh proxies) are analyzed per container. Each container gets its own recommendation based on its usage profile.

text
Example pod: app: 1000m -> 600m CPU (usage-based reduction) logging-sidecar: 100m -> 100m CPU (minimal, keep as-is) metrics-sidecar: 50m -> 50m CPU (minimal, keep as-is) Total pod: 1150m -> 750m CPU (35% reduction)

StatefulSets

All pods in a StatefulSet share the same resource spec. You cannot set different requests per pod index. Kubeadapt sizes all pods based on the highest-usage pod (typically the primary in a database).

This means read replicas may be slightly over-provisioned. Splitting into separate StatefulSets per role is possible but adds operational complexity for marginal savings.


Recommendation Lifecycle

Recommendations follow a state machine from creation to resolution.

States

StateDescriptionTrigger
PendingActive recommendation, continuously updated by the analyzerGenerated automatically
AcknowledgedUser confirmed review"Mark as Acknowledged" in UI
DismissedUser opted out of recommendations for this workload"Dismiss" in UI
ArchivedHistorical record of a previously applied recommendationAutomatic after cooldown

State Transitions

Recommendation state transitions

Cooldown Period

After a recommendation is applied, the analyzer waits 7 days before generating a new one for the same workload. This gives the workload time to stabilize under the new configuration.

Re-generation Rules

Previous StateBehaviorReason
PendingAlways updateKeep recommendation current with latest data
AcknowledgedWait for cooldown, then regenerate if savings threshold metRespect stabilization period
ArchivedNever modifyImmutable history record
DismissedBlock allUser explicitly opted out

Savings Threshold

Recommendations are only generated when the estimated monthly savings meet the policy minimum:

ProfileMinimum Monthly Savings
Production$4.0
Non-Production$5.0

This prevents recommendation noise for workloads with trivial savings potential.


Vertical Right-sizing and In-Place Updates

Kubeadapt currently provides vertical right-sizing recommendations for manual application via kubectl patch or kubectl set resources.

Kubernetes 1.27 introduced in-place pod resource adjustment as alpha. Kubernetes 1.33 promoted it to beta, requiring the InPlacePodVerticalScaling feature gate. Kubernetes 1.35 (December 2025) promoted it to GA (stable), enabled by default with no feature gate required.

In-place resize allows changing CPU and memory requests and limits on running pods without a restart in most cases. CPU changes typically apply immediately; memory limit increases may require a container restart depending on the runtime.

Kubeadapt plans to support automated in-place right-sizing. Until then, applying recommendations requires a rolling update or manual kubectl patch. See the Right-sizing Guide for the step-by-step process.

Tracking: KEP-1287: In-place Update of Pod Resources


QoS and Eviction Behavior

Kubernetes assigns a Quality of Service class to each pod based on how requests and limits are configured. This affects eviction priority when a node is under resource pressure.

QoS ClassConditionEviction Priority
Guaranteedrequests = limits for all containersLowest (most protected)
Burstablerequests < limitsMedium
BestEffortNo requests or limits setHighest (evicted first)

Kubelet eviction order during node pressure:

  1. BestEffort pods (evicted first)
  2. Burstable pods where usage exceeds requests
  3. Burstable pods where usage is below requests
  4. Guaranteed pods (evicted only if system services need resources)

Kubeadapt's recommendations produce Burstable QoS (requests < limits by 1.2x-1.3x). Under normal operation, these pods sit in group 3 and are only evicted after BestEffort and over-request Burstable pods are drained first.

Setting requests equal to limits (Guaranteed QoS) gives slightly better eviction priority, but the pod has zero burst capacity. If it briefly needs 10% more memory, it gets OOM killed.


Summary

The recommendation formula:

text
Request = Base Percentile x Growth Buffer x Spike Buffer CPU Limit = Request x 1.2 Memory Limit = P100 (max observed) x Growth Buffer x Spike Buffer x 1.3

Cost optimization happens at the request level, not limits. Production uses P95 CPU / P99 memory with tight limit buffers for reliability. Non-production uses P50 with the same buffers for aggressive savings. Growth buffer (0-20%) and spike buffer (0-35%) add headroom based on actual workload behavior. Memory limits are based on P100 (max observed) because OOM is fatal. CPU limits go unbounded when the max/limit ratio exceeds 5x.


Learn More

Related Documentation:

Workflows: