How HPA works in Kubernetes?

Horizontal Pod Autoscaler (HPA) automatically scales pods in Kubernetes based on observed CPU/memory utilization. This post demystifies how HPA calculates resource usage, determines scaling actions, and implements stabilization windows.

Understanding Horizontal Pod Autoscaler Configuration in Kubernetes

For those using Kubernetes, setting up Horizontal Pod Autoscaler (HPA) is a common way to address resource overconsumption during high loads.

HPA Configuration Basics

While most Kubernetes users know how to configure HPA for an application, some confusion still exists around how HPA calculates resource usage and determines scaling actions. The Kubernetes docs provide detailed information, but it's spread across multiple documents. To simplify things, I'll summarize my key learnings on HPA configuration in one place.

Scaling Based on Resource RequestsF

The current HPA settings for targetCPUUtilization and targetMemoryUtilization are based on requests.cpu and requests.memory. You can find an example configuration here.

There have been discussions about basing the HPA algorithm on limits instead of requests, but the best approach is still debated due to tradeoffs.

Is the Target Percentage Capped at 100?

In standard terminology, a percentage target is usually less than or equal to 100%. However, for HPA, this is not required. I was initially confused on this, but after researching various discussions and decisions, it makes sense.

HPA currently uses resources.requests as the baseline for calculating and comparing utilization. So a target over 100% is valid as long as it's below resources.limits.

For example, say a pod has resources.requests.cpu=200m and resources.limits.cpu=4. If HPA is configured with targetCPUUtilization=300%, then when the average consumption across pods reaches 300% of 200m (600m), new pods will scale up.

How Many Pods Will Scale?

Kubernetes uses a simple formula to calculate desired pod count for scaling up or down:

desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]

Stabilization Window

A key HPA goal is avoiding constant fluctuation in pod count due to traffic changes. To achieve this, a default 5 minute stabilization window is used.

HPA monitors load, and will wait 5 minutes before scaling down if utilization drops below the target. However, if load increases above the target within the window, it will wait another 5 minutes before scaling down.