KubeadaptDocsBack to site
Sign inStart free
DocsAPI ReferenceCLI
  • Introduction
  • Getting Started
  • Capabilities
    • How Kubeadapt works
    • Cost Monitoring
    • Cost Attribution
    • Optimization
      • Right-sizing
      • Spot migration
      • Abandoned workloads
      • Orphaned resources
      • Best practices
    • Smart Alerting
    • Sustainability
    • Resource Efficiency
    • How costs are computed
Docs homev1ConceptsRightsizing

Core Concepts

Right-sizing

How Kubeadapt analyzes CPU and memory usage patterns to generate container rightsizing recommendations.


Overview

In Kubernetes, you pay for what you request, not what you use. Pods that request 2 CPU cores but only use 200m still reserve the full 2 cores on the node. That unused capacity cannot be scheduled to other workloads. Most teams set high requests during initial deployment and never revisit them, so a large portion of requested resources goes unused.

Right-sizing is the process of adjusting those requests and limits to match actual usage, reducing cost while keeping workloads stable.


Core Principle: Request-Centric Optimization

Kubeadapt optimizes requests as the primary cost lever. Limits play a supporting role: they protect against anomalies, not control cost.

Why Requests Matter More Than Limits

Requests determine what you pay for. Cloud providers charge for the capacity reserved by requests, whether the pod actually uses it or not. Over-provisioned requests waste money. Under-provisioned requests cause scheduling failures and evictions.

Limits exist to cap runaway processes, not to enforce cost savings. Setting limits too tight causes CPU throttling and OOM kills even when the node has idle capacity.

When requests accurately reflect usage, the gap between request and limit shrinks naturally, and aggressive limits become unnecessary.

CPU vs Memory

PropertyCPUMemory
TypeCompressibleNon-compressible
Exceeding limitThrottled (slower, not fatal)OOM killed (pod restart)
Optimization approachMore aggressive (throttling is recoverable)Conservative (OOM is disruptive)
Default percentile (production)P95P99
Default percentile (non-production)P50P50

How Right-sizing Works

Kubeadapt right-sizes in three stages: it observes your pods, looks for usage patterns, and then issues a recommendation.


Stage 1: What Kubeadapt observes

For every pod, every 60 seconds, Kubeadapt collects:

Resource metrics:

  • CPU usage (millicores) and throttling events
  • Memory usage (bytes) and OOM kill events
  • Current requests and limits

Scheduling context:

  • Pod phase (Running, Pending, Failed)
  • Restart count
  • HPA status (if applicable)

Kubeadapt analyzes this data over a configurable lookback window (default: 7 days) when generating recommendations.


Stage 2: Pattern Analysis

Kubeadapt examines two independent dimensions of workload behavior:

DimensionWhat it measuresOutput
Growth trendIs usage increasing day over day?Growth buffer (1.0x - 1.20x)
Usage volatilityHow spiky is usage across the lookback period?Spike buffer (1.0x - 1.35x)

Both dimensions are computed independently. A workload can have high growth with low volatility, or stable averages with frequent spikes.

Growth Trend

Kubeadapt measures how fast your workload's usage is growing day over day. The result is expressed as a percentage change per day, which works the same way for CPU (millicores) and memory (bytes).

plaintext
Daily % Change = how fast usage is increasing

The percentage maps to a growth buffer via linear interpolation:

Daily % ChangeClassificationGrowth Buffer
<= 0%Stable or declining1.0x (no buffer)
0% - 0.5%Light growth1.0x - 1.10x
0.5% - 1.0%Steady growth1.10x - 1.20x
>= 1%Rapid growth1.20x (capped)

Formula: growth_buffer = 1.0 + clamp(daily_pct_change, 0, 1.0) × 0.20

Growing workloads get headroom so recommendations stay valid as usage increases. Stable workloads can be sized more tightly.

Usage Volatility

Kubeadapt measures how spiky your workload's usage is across the entire lookback period — using every data point, not just hourly or daily averages — to capture overall predictability:

plaintext
Volatility = how much usage swings around its average

The volatility maps to a spike buffer via linear interpolation:

VolatilitySpike BufferBehavior
Very low1.0xPerfectly stable
Low1.07xMinor variation
Moderate1.14xModerate spikes
High1.21xFrequent spikes
Very high1.28xHighly unpredictable
Extreme1.35x (capped)Extremely variable

Stable workloads can be sized close to the percentile. Spiky workloads need a larger buffer to absorb unpredictable peaks.

How the Two Dimensions Differ

  • Spike buffer measures within-period predictability (how much usage swings around its average).
  • Growth buffer measures the day-over-day trend (whether usage is increasing).

A workload can have high growth with low volatility, or stable averages with frequent spikes — Kubeadapt sizes for both at once.


Stage 3: Recommendation Generation

Request Formula

The core formula applies to both CPU and memory:

plaintext
Request = Base Percentile x Growth Buffer x Spike Buffer

Where:

  • Base Percentile comes from the cluster's analysis policy (P95 CPU / P99 memory for production, P50 for non-production)
  • Growth Buffer is 1.0x - 1.20x based on daily usage trend
  • Spike Buffer is 1.0x - 1.35x based on CV

Example:

plaintext
1Workload: api-gateway (production cluster)
2
3Base values from lookback period:
4  CPU P95: 120m
5  Memory P99: 850Mi
6
7Growth trend:
8  Daily % change: 0.75%
9  Growth buffer: 1.0 + (0.75 x 0.20) = 1.15x
10
11Usage volatility:
12  CV: 0.28
13  Spike buffer: 1.0 + (0.28 x 0.35) = 1.10x
14
15Final requests:
16  CPU: 120m x 1.15 x 1.10 = 152m (rounded to 155m)
17  Memory: 850Mi x 1.15 x 1.10 = 1076Mi (~1.1Gi)

CPU Limit Formula

plaintext
CPU Limit = CPU Request x CPULimitBuffer

The limit buffer comes from the cluster's analysis policy (1.2x for both production and non-production profiles).

Note

If the max observed CPU exceeds the calculated limit by more than 5x, Kubeadapt sets the CPU limit to 0 (unbounded). This applies to bursty workloads like log forwarders, batch processors, and CI runners. The request is still right-sized normally, so scheduling and cost allocation stay accurate.

Memory Limit Formula

Memory limits use a different base than memory requests. OOM kills are fatal (pod restart, potential data loss), so the limit is calculated from the maximum observed memory (P100), not the percentile:

plaintext
Memory Limit = P100 (max observed) x Growth Buffer x Spike Buffer x MemoryLimitBuffer
Warning

Memory limits are always based on P100 (max observed), not the policy percentile. The limit must cover the highest memory usage ever recorded. If P100 data is not available, the system falls back to the percentile value.

The memory limit buffer is 1.3x for both production and non-production profiles.


Environment-Based Recommendations

Kubeadapt generates different recommendations depending on the cluster's profile. The same deployment may receive different resource values in production vs. non-production.

Cluster Profile Selection

When registering a cluster in Kubeadapt, you select a profile that sets the analysis policy:

ProfileUse CaseApproach
ProductionMission-critical workloadsConservative: higher percentiles, tighter limits
Non-ProductionDev, test, stagingAggressive: lower percentiles, maximize savings
CustomUser-definedOverride thresholds per recommendation type via the UI
Note

Each recommendation type (workload right-sizing, idle workload detection, etc.) is tuned independently. Custom profiles let you override percentiles, limit buffers, and savings thresholds per recommendation type.

Policy Values

FieldProductionNon-Production
cpu_percentileP95P50
memory_percentileP99P50
cpu_limit_buffer1.2x1.2x
memory_limit_buffer1.3x1.3x
min_monthly_savings_usd$4.0$5.0

Both profiles use the same limit buffers. The cost difference comes from the base percentile: P95/P99 captures nearly all usage peaks, while P50 sizes to median usage for maximum savings.

Production Recommendations

Production clusters optimize for reliable savings. Requests are based on P95 CPU / P99 memory usage plus growth and spike buffers, covering nearly all observed usage patterns. The 1.2x/1.3x limit buffers add a controlled burst margin for anomalies.

Example:

plaintext
1Workload: api-gateway (production)
2
3CPU P95: 750m, daily growth 0.8%, CV 0.3
4
5Growth buffer: 1.16x
6Spike buffer: 1.105x
7
8CPU Request: 750m x 1.16 x 1.105 = 962m (rounded to 1000m)
9CPU Limit: 1000m x 1.2 = 1200m

The resulting QoS class is Burstable (requests < limits). See QoS and Eviction Behavior for how this affects pod priority during node pressure.

Non-Production Recommendations

Non-production clusters target aggressive cost reduction. Occasional throttling, eviction, or brief downtime is acceptable in dev/test environments. Requests are based on P50 (median) usage, which cuts the request footprint significantly since these workloads are idle most of the time.

Example:

plaintext
1Workload: api-gateway (development)
2
3CPU P50: 20m, daily growth 0.5%, CV 0.2
4
5Growth buffer: 1.10x
6Spike buffer: 1.07x
7
8CPU Request: 20m x 1.10 x 1.07 = 24m
9CPU Limit: 24m x 1.2 = 29m
10
11Memory P50: 180Mi, P100 (max): 320Mi
12Memory Request: 180Mi x 1.10 x 1.07 = 212Mi
13Memory Limit: 320Mi x 1.10 x 1.07 x 1.3 = 490Mi

P50-based requests allow much denser node packing. Since dev workloads run intermittently and rarely all spike at once, higher overcommit ratios are safe.

Environment Comparison

AspectProductionNon-Production
Base percentileP95 (CPU), P99 (Memory)P50 (CPU and Memory)
CPU limit buffer1.2x1.2x
Memory limit buffer1.3x1.3x
Typical node overcommit110-130%300%+
QoS targetBurstable (controlled)Burstable
Risk toleranceLowHigh

Why Aggressive Limits Cause Problems

Setting limits too tight causes issues even when the node has idle capacity.

CPU throttling example:

A container with a 300m CPU limit on a 4-core node (4000m capacity) can use at most 30ms of CPU per 100ms CFS period. If it needs 50ms of compute, it waits. The node might have 3700m of idle CPU, but the container cannot access it. The result is increased request latency even though the node is underutilized.

Memory OOM kill example:

A container with a 2Gi memory limit that normally uses 1.8Gi will be killed immediately if a traffic spike pushes usage to 2.1Gi. The pod restarts, connections drop, and in-flight requests fail.

Danger

Limits do not affect scheduling or billing. Only requests determine how much capacity is reserved and paid for on the node. Setting aggressive limits does not reduce cost. It only causes throttling and OOM kills.

Kubeadapt's approach:

  1. Right-size requests (this is where cost savings come from)
  2. Set limits as anomaly protection only (1.2x CPU above request, 1.3x memory above max observed usage)
  3. For bursty workloads, remove CPU limits entirely to prevent unnecessary throttling

Special Cases

Multi-Container Pods

Pods with sidecars (logging, metrics, service mesh proxies) are analyzed per container. Each container gets its own recommendation based on its usage profile.

plaintext
Example pod:
  app:             1000m -> 600m CPU (usage-based reduction)
  logging-sidecar: 100m  -> 100m CPU (minimal, keep as-is)
  metrics-sidecar: 50m   -> 50m  CPU (minimal, keep as-is)

  Total pod:       1150m -> 750m CPU (35% reduction)

StatefulSets

All pods in a StatefulSet share the same resource spec. You cannot set different requests per pod index. Kubeadapt sizes all pods based on the highest-usage pod (typically the primary in a database).

This means read replicas may be slightly over-provisioned. Splitting into separate StatefulSets per role is possible but adds operational complexity for marginal savings.


Recommendation Lifecycle

Recommendations follow a state machine from creation to resolution.

States

StateDescriptionTrigger
PendingActive recommendation, continuously updated as usage shiftsGenerated automatically
AcknowledgedUser confirmed review"Mark as Acknowledged" in UI
DismissedUser opted out of recommendations for this workload"Dismiss" in UI
ArchivedHistorical record of a previously applied recommendationAutomatic after cooldown

State Transitions

Diagram

recommendation-lifecycle

Interactive version on the dashboard.

Cooldown Period

After a recommendation is applied, Kubeadapt waits 7 days before generating a new one for the same workload. This gives the workload time to stabilize under the new configuration.

Re-generation Rules

Previous StateBehaviorReason
PendingAlways updateKeep recommendation current with latest data
AcknowledgedWait for cooldown, then regenerate if savings threshold metRespect stabilization period
ArchivedNever modifyImmutable history record
DismissedBlock allUser explicitly opted out

Savings Threshold

Recommendations are only generated when the estimated monthly savings meet the policy minimum:

ProfileMinimum Monthly Savings
Production$4.0
Non-Production$5.0

This prevents recommendation noise for workloads with trivial savings potential.


Vertical Right-sizing and In-Place Updates

Kubeadapt currently provides vertical right-sizing recommendations for manual application via kubectl patch or kubectl set resources.

Kubernetes 1.35 (December 2025) promoted in-place pod resource adjustment to GA, enabled by default with no feature gate required. (Earlier versions had it behind feature gates: alpha in 1.27, beta in 1.33.)

In-place resize allows changing CPU and memory requests and limits on running pods without a restart in most cases. CPU changes typically apply immediately; memory limit increases may require a container restart depending on the runtime.

Until automated in-place right-sizing is available in Kubeadapt, applying recommendations requires a rolling update or manual kubectl patch. See the Right-sizing Guide for the step-by-step process.

Tracking: KEP-1287: In-place Update of Pod Resources


QoS and Eviction Behavior

Kubernetes assigns a Quality of Service class to each pod based on how requests and limits are configured. This affects eviction priority when a node is under resource pressure.

QoS ClassConditionEviction Priority
Guaranteedrequests = limits for all containersLowest (most protected)
Burstablerequests < limitsMedium
BestEffortNo requests or limits setHighest (evicted first)

Kubelet eviction order during node pressure:

  1. BestEffort pods (evicted first)
  2. Burstable pods where usage exceeds requests
  3. Burstable pods where usage is below requests
  4. Guaranteed pods (evicted only if system services need resources)

Kubeadapt's recommendations produce Burstable QoS (requests < limits by 1.2x-1.3x). Under normal operation, these pods sit in group 3 and are only evicted after BestEffort and over-request Burstable pods are drained first.

Setting requests equal to limits (Guaranteed QoS) gives slightly better eviction priority, but the pod has zero burst capacity. If it briefly needs 10% more memory, it gets OOM killed.


Summary

The recommendation formula:

plaintext
Request = Base Percentile x Growth Buffer x Spike Buffer
CPU Limit = Request x 1.2
Memory Limit = P100 (max observed) x Growth Buffer x Spike Buffer x 1.3

Cost optimization happens at the request level, not limits. Production uses P95 CPU / P99 memory with tight limit buffers for reliability. Non-production uses P50 with the same buffers for aggressive savings. Growth buffer (0-20%) and spike buffer (0-35%) add headroom based on actual workload behavior. Memory limits are based on P100 (max observed) because OOM is fatal. CPU limits go unbounded when the max/limit ratio exceeds 5x.


Learn More

Related Documentation:

  • Cost Attribution - How costs are calculated
  • Resource Efficiency - Efficiency metrics explained

Workflows:

  • Right-sizing Guide - Step-by-step optimization process
PreviousLabel-based attributionCore ConceptsNextRight-sizingCore Concepts

On this page

  • Overview
  • Core Principle: Request-Centric Optimization
  • Why Requests Matter More Than Limits
  • CPU vs Memory
  • How Right-sizing Works
  • Stage 1: What Kubeadapt observes
  • Stage 2: Pattern Analysis
  • Stage 3: Recommendation Generation
  • Environment-Based Recommendations
  • Cluster Profile Selection
  • Policy Values
  • Production Recommendations
  • Non-Production Recommendations
  • Environment Comparison
  • Why Aggressive Limits Cause Problems
  • Special Cases
  • Multi-Container Pods
  • StatefulSets
  • Recommendation Lifecycle
  • States
  • State Transitions
  • Cooldown Period
  • Re-generation Rules
  • Savings Threshold
  • Vertical Right-sizing and In-Place Updates
  • QoS and Eviction Behavior
  • Summary
  • Learn More
Edit this page
Kubeadapt

Kubernetes FinOps platform. Cost visibility, rightsizing, and capacity planning that pays for itself in 30 days.

Product

  • Cost Monitoring
  • Cost Attribution
  • Workload Rightsizing
  • Recommendations
  • Smart Alerting
  • Best Practices
  • Network Cross-AZ

Resources

  • Documentation
  • Status Page
  • Feature Requests

Company

  • About Us
  • Security
  • Careers
  • Contact

© 2026 Kubeadapt. All rights reserved.

PrivacyTermsSecurity