Core Concepts

Right-sizing

How Kubeadapt analyzes CPU and memory usage patterns to generate container rightsizing recommendations.

Overview

In Kubernetes, you pay for what you request, not what you use. Pods that request 2 CPU cores but only use 200m still reserve the full 2 cores on the node. That unused capacity cannot be scheduled to other workloads. Most teams set high requests during initial deployment and never revisit them, so a large portion of requested resources goes unused.

Right-sizing is the process of adjusting those requests and limits to match actual usage, reducing cost while keeping workloads stable.

Core Principle: Request-Centric Optimization

Kubeadapt optimizes requests as the primary cost lever. Limits play a supporting role: they protect against anomalies, not control cost.

Why Requests Matter More Than Limits

Requests determine what you pay for. Cloud providers charge for the capacity reserved by requests, whether the pod actually uses it or not. Over-provisioned requests waste money. Under-provisioned requests cause scheduling failures and evictions.

Limits exist to cap runaway processes, not to enforce cost savings. Setting limits too tight causes CPU throttling and OOM kills even when the node has idle capacity.

When requests accurately reflect usage, the gap between request and limit shrinks naturally, and aggressive limits become unnecessary.

CPU vs Memory

Property	CPU	Memory
Type	Compressible	Non-compressible
Exceeding limit	Throttled (slower, not fatal)	OOM killed (pod restart)
Optimization approach	More aggressive (throttling is recoverable)	Conservative (OOM is disruptive)
Default percentile (production)	P95	P99
Default percentile (non-production)	P50	P50

How Right-sizing Works

Kubeadapt right-sizes in three stages: it observes your pods, looks for usage patterns, and then issues a recommendation.

Stage 1: What Kubeadapt observes

For every pod, every 60 seconds, Kubeadapt collects:

Resource metrics:

CPU usage (millicores) and throttling events
Memory usage (bytes) and OOM kill events
Current requests and limits

Scheduling context:

Pod phase (Running, Pending, Failed)
Restart count
HPA status (if applicable)

Kubeadapt analyzes this data over a configurable lookback window (default: 7 days) when generating recommendations.

Stage 2: Pattern Analysis

Kubeadapt examines two independent dimensions of workload behavior:

Dimension	What it measures	Output
Growth trend	Is usage increasing day over day?	Growth buffer (1.0x - 1.20x)
Usage volatility	How spiky is usage across the lookback period?	Spike buffer (1.0x - 1.35x)

Both dimensions are computed independently. A workload can have high growth with low volatility, or stable averages with frequent spikes.

Growth Trend

Kubeadapt measures how fast your workload's usage is growing day over day. The result is expressed as a percentage change per day, which works the same way for CPU (millicores) and memory (bytes).

plaintext

Daily % Change = how fast usage is increasing

The percentage maps to a growth buffer via linear interpolation:

Daily % Change	Classification	Growth Buffer
<= 0%	Stable or declining	1.0x (no buffer)
0% - 0.5%	Light growth	1.0x - 1.10x
0.5% - 1.0%	Steady growth	1.10x - 1.20x
>= 1%	Rapid growth	1.20x (capped)

Formula: growth_buffer = 1.0 + clamp(daily_pct_change, 0, 1.0) × 0.20

Growing workloads get headroom so recommendations stay valid as usage increases. Stable workloads can be sized more tightly.

Usage Volatility

Kubeadapt measures how spiky your workload's usage is across the entire lookback period — using every data point, not just hourly or daily averages — to capture overall predictability:

plaintext

Volatility = how much usage swings around its average

The volatility maps to a spike buffer via linear interpolation:

Volatility	Spike Buffer	Behavior
Very low	1.0x	Perfectly stable
Low	1.07x	Minor variation
Moderate	1.14x	Moderate spikes
High	1.21x	Frequent spikes
Very high	1.28x	Highly unpredictable
Extreme	1.35x (capped)	Extremely variable

Stable workloads can be sized close to the percentile. Spiky workloads need a larger buffer to absorb unpredictable peaks.

How the Two Dimensions Differ

Spike buffer measures within-period predictability (how much usage swings around its average).
Growth buffer measures the day-over-day trend (whether usage is increasing).

A workload can have high growth with low volatility, or stable averages with frequent spikes — Kubeadapt sizes for both at once.

Stage 3: Recommendation Generation

Request Formula

The core formula applies to both CPU and memory:

plaintext

Request = Base Percentile x Growth Buffer x Spike Buffer

Where:

Base Percentile comes from the cluster's analysis policy (P95 CPU / P99 memory for production, P50 for non-production)
Growth Buffer is 1.0x - 1.20x based on daily usage trend
Spike Buffer is 1.0x - 1.35x based on CV

Example:

plaintext

1Workload: api-gateway (production cluster)
2
3Base values from lookback period:
4  CPU P95: 120m
5  Memory P99: 850Mi
6
7Growth trend:
8  Daily % change: 0.75%
9  Growth buffer: 1.0 + (0.75 x 0.20) = 1.15x
10
11Usage volatility:
12  CV: 0.28
13  Spike buffer: 1.0 + (0.28 x 0.35) = 1.10x
14
15Final requests:
16  CPU: 120m x 1.15 x 1.10 = 152m (rounded to 155m)
17  Memory: 850Mi x 1.15 x 1.10 = 1076Mi (~1.1Gi)

CPU Limit Formula

plaintext

CPU Limit = CPU Request x CPULimitBuffer

The limit buffer comes from the cluster's analysis policy (1.2x for both production and non-production profiles).

Memory Limit Formula

Memory limits use a different base than memory requests. OOM kills are fatal (pod restart, potential data loss), so the limit is calculated from the maximum observed memory (P100), not the percentile:

plaintext

Memory Limit = P100 (max observed) x Growth Buffer x Spike Buffer x MemoryLimitBuffer

The memory limit buffer is 1.3x for both production and non-production profiles.

Environment-Based Recommendations

Kubeadapt generates different recommendations depending on the cluster's profile. The same deployment may receive different resource values in production vs. non-production.

Cluster Profile Selection

When registering a cluster in Kubeadapt, you select a profile that sets the analysis policy:

Profile	Use Case	Approach
Production	Mission-critical workloads	Conservative: higher percentiles, tighter limits
Non-Production	Dev, test, staging	Aggressive: lower percentiles, maximize savings
Custom	User-defined	Override thresholds per recommendation type via the UI

Policy Values

Field	Production	Non-Production
`cpu_percentile`	P95	P50
`memory_percentile`	P99	P50
`cpu_limit_buffer`	1.2x	1.2x
`memory_limit_buffer`	1.3x	1.3x
`min_monthly_savings_usd`	$4.0	$5.0

Both profiles use the same limit buffers. The cost difference comes from the base percentile: P95/P99 captures nearly all usage peaks, while P50 sizes to median usage for maximum savings.

Production Recommendations

Production clusters optimize for reliable savings. Requests are based on P95 CPU / P99 memory usage plus growth and spike buffers, covering nearly all observed usage patterns. The 1.2x/1.3x limit buffers add a controlled burst margin for anomalies.

Example:

plaintext

1Workload: api-gateway (production)
2
3CPU P95: 750m, daily growth 0.8%, CV 0.3
4
5Growth buffer: 1.16x
6Spike buffer: 1.105x
7
8CPU Request: 750m x 1.16 x 1.105 = 962m (rounded to 1000m)
9CPU Limit: 1000m x 1.2 = 1200m

The resulting QoS class is Burstable (requests < limits). See QoS and Eviction Behavior for how this affects pod priority during node pressure.

Non-Production Recommendations

Non-production clusters target aggressive cost reduction. Occasional throttling, eviction, or brief downtime is acceptable in dev/test environments. Requests are based on P50 (median) usage, which cuts the request footprint significantly since these workloads are idle most of the time.

Example:

plaintext

1Workload: api-gateway (development)
2
3CPU P50: 20m, daily growth 0.5%, CV 0.2
4
5Growth buffer: 1.10x
6Spike buffer: 1.07x
7
8CPU Request: 20m x 1.10 x 1.07 = 24m
9CPU Limit: 24m x 1.2 = 29m
10
11Memory P50: 180Mi, P100 (max): 320Mi
12Memory Request: 180Mi x 1.10 x 1.07 = 212Mi
13Memory Limit: 320Mi x 1.10 x 1.07 x 1.3 = 490Mi

P50-based requests allow much denser node packing. Since dev workloads run intermittently and rarely all spike at once, higher overcommit ratios are safe.

Environment Comparison

Aspect	Production	Non-Production
Base percentile	P95 (CPU), P99 (Memory)	P50 (CPU and Memory)
CPU limit buffer	1.2x	1.2x
Memory limit buffer	1.3x	1.3x
Typical node overcommit	110-130%	300%+
QoS target	Burstable (controlled)	Burstable
Risk tolerance	Low	High

Why Aggressive Limits Cause Problems

Setting limits too tight causes issues even when the node has idle capacity.

CPU throttling example:

A container with a 300m CPU limit on a 4-core node (4000m capacity) can use at most 30ms of CPU per 100ms CFS period. If it needs 50ms of compute, it waits. The node might have 3700m of idle CPU, but the container cannot access it. The result is increased request latency even though the node is underutilized.

Memory OOM kill example:

A container with a 2Gi memory limit that normally uses 1.8Gi will be killed immediately if a traffic spike pushes usage to 2.1Gi. The pod restarts, connections drop, and in-flight requests fail.

Kubeadapt's approach:

Right-size requests (this is where cost savings come from)
Set limits as anomaly protection only (1.2x CPU above request, 1.3x memory above max observed usage)
For bursty workloads, remove CPU limits entirely to prevent unnecessary throttling

Special Cases

Multi-Container Pods

Pods with sidecars (logging, metrics, service mesh proxies) are analyzed per container. Each container gets its own recommendation based on its usage profile.

plaintext

Example pod:
  app:             1000m -> 600m CPU (usage-based reduction)
  logging-sidecar: 100m  -> 100m CPU (minimal, keep as-is)
  metrics-sidecar: 50m   -> 50m  CPU (minimal, keep as-is)

  Total pod:       1150m -> 750m CPU (35% reduction)

StatefulSets

All pods in a StatefulSet share the same resource spec. You cannot set different requests per pod index. Kubeadapt sizes all pods based on the highest-usage pod (typically the primary in a database).

This means read replicas may be slightly over-provisioned. Splitting into separate StatefulSets per role is possible but adds operational complexity for marginal savings.

Recommendation Lifecycle

Recommendations follow a state machine from creation to resolution.

States

State	Description	Trigger
Pending	Active recommendation, continuously updated as usage shifts	Generated automatically
Acknowledged	User confirmed review	"Mark as Acknowledged" in UI
Dismissed	User opted out of recommendations for this workload	"Dismiss" in UI
Archived	Historical record of a previously applied recommendation	Automatic after cooldown

State Transitions

Diagram

recommendation-lifecycle

Interactive version on the dashboard.

Cooldown Period

After a recommendation is applied, Kubeadapt waits 7 days before generating a new one for the same workload. This gives the workload time to stabilize under the new configuration.

Re-generation Rules

Previous State	Behavior	Reason
Pending	Always update	Keep recommendation current with latest data
Acknowledged	Wait for cooldown, then regenerate if savings threshold met	Respect stabilization period
Archived	Never modify	Immutable history record
Dismissed	Block all	User explicitly opted out

Savings Threshold

Recommendations are only generated when the estimated monthly savings meet the policy minimum:

Profile	Minimum Monthly Savings
Production	$4.0
Non-Production	$5.0

This prevents recommendation noise for workloads with trivial savings potential.

Vertical Right-sizing and In-Place Updates

Kubeadapt currently provides vertical right-sizing recommendations for manual application via kubectl patch or kubectl set resources.

Kubernetes 1.35 (December 2025) promoted in-place pod resource adjustment to GA, enabled by default with no feature gate required. (Earlier versions had it behind feature gates: alpha in 1.27, beta in 1.33.)

In-place resize allows changing CPU and memory requests and limits on running pods without a restart in most cases. CPU changes typically apply immediately; memory limit increases may require a container restart depending on the runtime.

Until automated in-place right-sizing is available in Kubeadapt, applying recommendations requires a rolling update or manual kubectl patch. See the Right-sizing Guide for the step-by-step process.

Tracking: KEP-1287: In-place Update of Pod Resources

QoS and Eviction Behavior

Kubernetes assigns a Quality of Service class to each pod based on how requests and limits are configured. This affects eviction priority when a node is under resource pressure.

QoS Class	Condition	Eviction Priority
Guaranteed	requests = limits for all containers	Lowest (most protected)
Burstable	requests < limits	Medium
BestEffort	No requests or limits set	Highest (evicted first)

Kubelet eviction order during node pressure:

BestEffort pods (evicted first)
Burstable pods where usage exceeds requests
Burstable pods where usage is below requests
Guaranteed pods (evicted only if system services need resources)

Kubeadapt's recommendations produce Burstable QoS (requests < limits by 1.2x-1.3x). Under normal operation, these pods sit in group 3 and are only evicted after BestEffort and over-request Burstable pods are drained first.

Setting requests equal to limits (Guaranteed QoS) gives slightly better eviction priority, but the pod has zero burst capacity. If it briefly needs 10% more memory, it gets OOM killed.

Summary

The recommendation formula:

plaintext

Request = Base Percentile x Growth Buffer x Spike Buffer
CPU Limit = Request x 1.2
Memory Limit = P100 (max observed) x Growth Buffer x Spike Buffer x 1.3

Cost optimization happens at the request level, not limits. Production uses P95 CPU / P99 memory with tight limit buffers for reliability. Non-production uses P50 with the same buffers for aggressive savings. Growth buffer (0-20%) and spike buffer (0-35%) add headroom based on actual workload behavior. Memory limits are based on P100 (max observed) because OOM is fatal. CPU limits go unbounded when the max/limit ratio exceeds 5x.

Learn More

Related Documentation:

Cost Attribution - How costs are calculated
Resource Efficiency - Efficiency metrics explained

Workflows:

Right-sizing Guide - Step-by-step optimization process

Core Concepts

Right-sizing

How Kubeadapt analyzes CPU and memory usage patterns to generate container rightsizing recommendations.

Overview

Right-sizing is the process of adjusting those requests and limits to match actual usage, reducing cost while keeping workloads stable.

Core Principle: Request-Centric Optimization

Kubeadapt optimizes requests as the primary cost lever. Limits play a supporting role: they protect against anomalies, not control cost.

Why Requests Matter More Than Limits

Limits exist to cap runaway processes, not to enforce cost savings. Setting limits too tight causes CPU throttling and OOM kills even when the node has idle capacity.

When requests accurately reflect usage, the gap between request and limit shrinks naturally, and aggressive limits become unnecessary.

CPU vs Memory

Property	CPU	Memory
Type	Compressible	Non-compressible
Exceeding limit	Throttled (slower, not fatal)	OOM killed (pod restart)
Optimization approach	More aggressive (throttling is recoverable)	Conservative (OOM is disruptive)
Default percentile (production)	P95	P99
Default percentile (non-production)	P50	P50

How Right-sizing Works

Kubeadapt right-sizes in three stages: it observes your pods, looks for usage patterns, and then issues a recommendation.

Stage 1: What Kubeadapt observes

For every pod, every 60 seconds, Kubeadapt collects:

Resource metrics:

CPU usage (millicores) and throttling events
Memory usage (bytes) and OOM kill events
Current requests and limits

Scheduling context:

Pod phase (Running, Pending, Failed)
Restart count
HPA status (if applicable)

Kubeadapt analyzes this data over a configurable lookback window (default: 7 days) when generating recommendations.

Stage 2: Pattern Analysis

Kubeadapt examines two independent dimensions of workload behavior:

Dimension	What it measures	Output
Growth trend	Is usage increasing day over day?	Growth buffer (1.0x - 1.20x)
Usage volatility	How spiky is usage across the lookback period?	Spike buffer (1.0x - 1.35x)

Both dimensions are computed independently. A workload can have high growth with low volatility, or stable averages with frequent spikes.

Growth Trend

Kubeadapt measures how fast your workload's usage is growing day over day. The result is expressed as a percentage change per day, which works the same way for CPU (millicores) and memory (bytes).

plaintext

Daily % Change = how fast usage is increasing

The percentage maps to a growth buffer via linear interpolation:

Daily % Change	Classification	Growth Buffer
<= 0%	Stable or declining	1.0x (no buffer)
0% - 0.5%	Light growth	1.0x - 1.10x
0.5% - 1.0%	Steady growth	1.10x - 1.20x
>= 1%	Rapid growth	1.20x (capped)

Formula: growth_buffer = 1.0 + clamp(daily_pct_change, 0, 1.0) × 0.20

Growing workloads get headroom so recommendations stay valid as usage increases. Stable workloads can be sized more tightly.

Usage Volatility

Kubeadapt measures how spiky your workload's usage is across the entire lookback period — using every data point, not just hourly or daily averages — to capture overall predictability:

plaintext

Volatility = how much usage swings around its average

The volatility maps to a spike buffer via linear interpolation:

Volatility	Spike Buffer	Behavior
Very low	1.0x	Perfectly stable
Low	1.07x	Minor variation
Moderate	1.14x	Moderate spikes
High	1.21x	Frequent spikes
Very high	1.28x	Highly unpredictable
Extreme	1.35x (capped)	Extremely variable

Stable workloads can be sized close to the percentile. Spiky workloads need a larger buffer to absorb unpredictable peaks.

How the Two Dimensions Differ

Spike buffer measures within-period predictability (how much usage swings around its average).
Growth buffer measures the day-over-day trend (whether usage is increasing).

A workload can have high growth with low volatility, or stable averages with frequent spikes — Kubeadapt sizes for both at once.

Stage 3: Recommendation Generation

Request Formula

The core formula applies to both CPU and memory:

plaintext

Request = Base Percentile x Growth Buffer x Spike Buffer

Where:

Base Percentile comes from the cluster's analysis policy (P95 CPU / P99 memory for production, P50 for non-production)
Growth Buffer is 1.0x - 1.20x based on daily usage trend
Spike Buffer is 1.0x - 1.35x based on CV

Example:

plaintext

1Workload: api-gateway (production cluster)
2
3Base values from lookback period:
4  CPU P95: 120m
5  Memory P99: 850Mi
6
7Growth trend:
8  Daily % change: 0.75%
9  Growth buffer: 1.0 + (0.75 x 0.20) = 1.15x
10
11Usage volatility:
12  CV: 0.28
13  Spike buffer: 1.0 + (0.28 x 0.35) = 1.10x
14
15Final requests:
16  CPU: 120m x 1.15 x 1.10 = 152m (rounded to 155m)
17  Memory: 850Mi x 1.15 x 1.10 = 1076Mi (~1.1Gi)

CPU Limit Formula

plaintext

CPU Limit = CPU Request x CPULimitBuffer

The limit buffer comes from the cluster's analysis policy (1.2x for both production and non-production profiles).

Memory Limit Formula

plaintext

Memory Limit = P100 (max observed) x Growth Buffer x Spike Buffer x MemoryLimitBuffer

The memory limit buffer is 1.3x for both production and non-production profiles.

Environment-Based Recommendations

Kubeadapt generates different recommendations depending on the cluster's profile. The same deployment may receive different resource values in production vs. non-production.

Cluster Profile Selection

When registering a cluster in Kubeadapt, you select a profile that sets the analysis policy:

Profile	Use Case	Approach
Production	Mission-critical workloads	Conservative: higher percentiles, tighter limits
Non-Production	Dev, test, staging	Aggressive: lower percentiles, maximize savings
Custom	User-defined	Override thresholds per recommendation type via the UI

Policy Values

Field	Production	Non-Production
`cpu_percentile`	P95	P50
`memory_percentile`	P99	P50
`cpu_limit_buffer`	1.2x	1.2x
`memory_limit_buffer`	1.3x	1.3x
`min_monthly_savings_usd`	$4.0	$5.0

Both profiles use the same limit buffers. The cost difference comes from the base percentile: P95/P99 captures nearly all usage peaks, while P50 sizes to median usage for maximum savings.

Production Recommendations

Example:

plaintext

1Workload: api-gateway (production)
2
3CPU P95: 750m, daily growth 0.8%, CV 0.3
4
5Growth buffer: 1.16x
6Spike buffer: 1.105x
7
8CPU Request: 750m x 1.16 x 1.105 = 962m (rounded to 1000m)
9CPU Limit: 1000m x 1.2 = 1200m

The resulting QoS class is Burstable (requests < limits). See QoS and Eviction Behavior for how this affects pod priority during node pressure.

Non-Production Recommendations

Example:

plaintext

1Workload: api-gateway (development)
2
3CPU P50: 20m, daily growth 0.5%, CV 0.2
4
5Growth buffer: 1.10x
6Spike buffer: 1.07x
7
8CPU Request: 20m x 1.10 x 1.07 = 24m
9CPU Limit: 24m x 1.2 = 29m
10
11Memory P50: 180Mi, P100 (max): 320Mi
12Memory Request: 180Mi x 1.10 x 1.07 = 212Mi
13Memory Limit: 320Mi x 1.10 x 1.07 x 1.3 = 490Mi

P50-based requests allow much denser node packing. Since dev workloads run intermittently and rarely all spike at once, higher overcommit ratios are safe.

Environment Comparison

Aspect	Production	Non-Production
Base percentile	P95 (CPU), P99 (Memory)	P50 (CPU and Memory)
CPU limit buffer	1.2x	1.2x
Memory limit buffer	1.3x	1.3x
Typical node overcommit	110-130%	300%+
QoS target	Burstable (controlled)	Burstable
Risk tolerance	Low	High

Why Aggressive Limits Cause Problems

Setting limits too tight causes issues even when the node has idle capacity.

CPU throttling example:

Memory OOM kill example:

A container with a 2Gi memory limit that normally uses 1.8Gi will be killed immediately if a traffic spike pushes usage to 2.1Gi. The pod restarts, connections drop, and in-flight requests fail.

Kubeadapt's approach:

Right-size requests (this is where cost savings come from)
Set limits as anomaly protection only (1.2x CPU above request, 1.3x memory above max observed usage)
For bursty workloads, remove CPU limits entirely to prevent unnecessary throttling

Special Cases

Multi-Container Pods

Pods with sidecars (logging, metrics, service mesh proxies) are analyzed per container. Each container gets its own recommendation based on its usage profile.

plaintext

Example pod:
  app:             1000m -> 600m CPU (usage-based reduction)
  logging-sidecar: 100m  -> 100m CPU (minimal, keep as-is)
  metrics-sidecar: 50m   -> 50m  CPU (minimal, keep as-is)

  Total pod:       1150m -> 750m CPU (35% reduction)

StatefulSets

All pods in a StatefulSet share the same resource spec. You cannot set different requests per pod index. Kubeadapt sizes all pods based on the highest-usage pod (typically the primary in a database).

This means read replicas may be slightly over-provisioned. Splitting into separate StatefulSets per role is possible but adds operational complexity for marginal savings.

Recommendation Lifecycle

Recommendations follow a state machine from creation to resolution.

States

State	Description	Trigger
Pending	Active recommendation, continuously updated as usage shifts	Generated automatically
Acknowledged	User confirmed review	"Mark as Acknowledged" in UI
Dismissed	User opted out of recommendations for this workload	"Dismiss" in UI
Archived	Historical record of a previously applied recommendation	Automatic after cooldown

State Transitions

Diagram

recommendation-lifecycle

Interactive version on the dashboard.

Cooldown Period

After a recommendation is applied, Kubeadapt waits 7 days before generating a new one for the same workload. This gives the workload time to stabilize under the new configuration.

Re-generation Rules

Previous State	Behavior	Reason
Pending	Always update	Keep recommendation current with latest data
Acknowledged	Wait for cooldown, then regenerate if savings threshold met	Respect stabilization period
Archived	Never modify	Immutable history record
Dismissed	Block all	User explicitly opted out

Savings Threshold

Recommendations are only generated when the estimated monthly savings meet the policy minimum:

Profile	Minimum Monthly Savings
Production	$4.0
Non-Production	$5.0

This prevents recommendation noise for workloads with trivial savings potential.

Vertical Right-sizing and In-Place Updates

Kubeadapt currently provides vertical right-sizing recommendations for manual application via kubectl patch or kubectl set resources.

Until automated in-place right-sizing is available in Kubeadapt, applying recommendations requires a rolling update or manual kubectl patch. See the Right-sizing Guide for the step-by-step process.

Tracking: KEP-1287: In-place Update of Pod Resources

QoS and Eviction Behavior

Kubernetes assigns a Quality of Service class to each pod based on how requests and limits are configured. This affects eviction priority when a node is under resource pressure.

QoS Class	Condition	Eviction Priority
Guaranteed	requests = limits for all containers	Lowest (most protected)
Burstable	requests < limits	Medium
BestEffort	No requests or limits set	Highest (evicted first)

Kubelet eviction order during node pressure:

BestEffort pods (evicted first)
Burstable pods where usage exceeds requests
Burstable pods where usage is below requests
Guaranteed pods (evicted only if system services need resources)

Setting requests equal to limits (Guaranteed QoS) gives slightly better eviction priority, but the pod has zero burst capacity. If it briefly needs 10% more memory, it gets OOM killed.

Summary

The recommendation formula:

plaintext

Request = Base Percentile x Growth Buffer x Spike Buffer
CPU Limit = Request x 1.2
Memory Limit = P100 (max observed) x Growth Buffer x Spike Buffer x 1.3

Learn More

Related Documentation:

Cost Attribution - How costs are calculated
Resource Efficiency - Efficiency metrics explained

Workflows:

Right-sizing Guide - Step-by-step optimization process