Default HPA Behaviour

In Kubernetes, HPA waits for at least 10% change in CPU/memory usage before adding or removing pods.

You can't control how sensitive it is to an increase or decrease in usage.

For example,

If you set 70% as the usage target, Kubernetes would only scale up if it went above 77%, and scale down if it went below 63%.

Here is the problem with fixed 10%?

Sometimes you want the system to react quickly when traffic spikes. Other times, you want it to be more cautious to avoid constant scaling up and down.

But now, you can set separate tolerance levels for scaling up and scaling down.

HPA Tolerance Levels (beta feature)

HPA Tolerance Levels is a beta feature in Kubernetes v1.35, which is available by default.

This allows you to customize how sensitive your Horizontal Pod Autoscaler (HPA) is to metric changes

This means you can make HPA respond faster when traffic increases (scale up quickly) and wait longer when traffic slows down (avoid scaling down too fast).

Here is an example.

Let's understand the tolerance settings.

scaleUp: tolerance: 0.01 (1% tolerance) - This means your app will scale UP (add more pods) very aggressively.

For example, If your pods are running at 71% CPU, the HPA will immediately add more pods because it's above the 70.7% threshold.

scaleDown: tolerance: 0.05 (5% tolerance) - This means your app will scale DOWN (remove pods) more conservatively.

For example, if your pods are running at 67% CPU, the HPA will NOT remove pods because it's still above the 66.5% threshold.

Hands on Example

Now let's deploy a simple deployment that runs a CPU stress container.

Now lets create the HPA with tolerance values.

In this HPA, CPU utilization is the key metric. It expects each Pod to use around 10% of its requested CPU.

In our Deployment, the CPU request is 100m (0.1 core). Ten percent of that is 10m CPU, which matches the load generated by the stress command in the container.

The HPA also defines a scale up tolerance of 0.02 (2%). This means the HPA will trigger scaling as soon as usage goes just above the target. In this case, above 12% of 100m (≈12m CPU). That is 10% target + 2% tolerance value.

Wrapping Up

This change means your apps can now scale more precisely, getting resources exactly when needed based on the configuration you set.

A small change that makes a huge difference in performance and efficiency.

📦 K8s & CKA Exam Prep Course

Preparing for the Certified Kubernetes Administrator (CKA) exam?

We have got something that will seriously boost your confidence.

Here is what you get.

  • Covers all key domains: Pods, Services, Networking, Ingress, Gateway API, Operators, Storage, Security, Upgrades, and more

  • All concepts explained with illustrations.

  • Created by CKA-certified engineers with production experience

  • Access to our exclusive support community

Whether you are just starting or reviewing before exam day, this course is built to sharpen your skills and improve your speed under pressure.

🎁 Use code FLASH40 to get 40% off (limited-time launch offer).

👉 Enroll here: The complete CKA Course

Reply

Avatar

or to participate

Keep Reading