Kubernetes HPA Tolerance Levels

K8s Course + CKA Practice Questions & Answers

Preparing for the Certified Kubernetes Administrator (CKA) exam?

We have got something that will seriously boost your confidence.

Our CKA Practice Questions & Explanations is now live with updated CKA syllabus

Here is what you get.

  • 80+ real-world practice scenarios

  • Step-by-step explanations for every question

  • Covers all key domains: Pods, Services, Networking, Ingress, Gateway API, Operators, Storage, Security, Upgrades, and more

  • Updated to match the latest CNCF CKA 2025 syllabus

  • Created by CKA-certified engineers with production experience

  • Access to our exclusive support community

Whether you are just starting or reviewing before exam day, this course is built to sharpen your skills and improve your speed under pressure.

🎁 Use code DCUBE30 to get 30% off the bundle (limited-time launch offer).

Default HPA Behaviour

In Kubernetes, HPA waits for at least 10% change in CPU/memory usage before adding or removing pods.

You cant control how sensitive it was to increase or decrease in usage.

For example,

If you set 70% as the usage target, Kubernetes would only scale up if it went above 77%, and scale down if it went below 63%.

Here is the problem with with fixed 10%?

Sometimes you want the system to react quickly when traffic spikes. Other times, you want it to be more cautious to avoid constant scaling up and down.

Now in Kubernetes v1.33, you can set separate tolerance levels for scaling up and scaling down.

HPA Tolerance Levels (alpha feature)

Important Note: Alpha APIs can change dramatically or be removed entirely between versions. It has minimal real-world testing and may have bugs that could impact your workloads.

To test this feature, you need to enable the HPAConfigurableTolerance feature gate in the cluster. Refer this guide to know more.

HPA Tolerance Levels is a new alpha feature in Kubernetes v1.33 that allows you to customize how sensitive your Horizontal Pod Autoscaler (HPA) is to metric changes

This means you can make HPA respond faster when traffic increases (scale up quickly) and wait longer when traffic slows down (avoid scaling down too fast).

Here is an example.

Lets understand the the tolerance settings.

scaleUp: tolerance: 0.01 (1% tolerance) - This means your app will scale UP (add more pods) very aggressively.

For example, If your pods are running at 71% CPU, the HPA will immediately add more pods because it's above the 70.7% threshold.

scaleDown: tolerance: 0.05 (5% tolerance) - This means your app will scale DOWN (remove pods) more conservatively.

For example, if your pods are running at 67% CPU, the HPA will NOT remove pods because it's still above the 66.5% threshold.

Hands on Example

You need to first enable the Enable HPAConfigurableTolerance feature gate in the API server manifest.

From the control plane, open the API server manifest.

sudo vi /etc/kubernetes/manifests/kube-apiserver.yaml

Add the following feature flag.

- --feature-gates=HPAConfigurableTolerance=true

Once you save it, the API server will restart automatically.

Now lets deploy a simple deployment that that runs a CPU stress container.

Now lets create the HPA with tolerance values.

In this HPA, CPU utilization is the key metric. It expects each Pod to use around 10% of its requested CPU.

In our Deployment, the CPU request is 100m (0.1 core). Ten percent of that is 10m CPU, which matches the load generated by the stress command in the container.

The HPA also defines a scaleUp tolerance of 0.02 (2%). This means the HPA will trigger scaling as soon as usage goes just above the target. In this case, above 12% of 100m (≈12m CPU). That is 10% target + 2% tolerance value.

Wrapping Up

This change means your apps can now scale more precisely, getting resources exactly when needed based on the configuration you set.

A small change that makes a huge difference in performance and efficiency.

Reply

or to participate.