Aug 4, 2022 5 min read

Kubernetes Autoscaling: HPA vs VPA, A Complete Guide

What is Kubernetes?

Kubernetes is an open source container orchestration engine for automating deployment, scaling, and management of containerized applications. It’s supported by all hyperscaller cloud providers and widely used by different companies. Amazon, Google, IBM, Microsoft, Oracle, Red Hat, SUSE, Platform9, IONOS and VMware offer Kubernetes-based platforms or infrastructure as a service (IaaS) that deploy Kubernetes.

Vertical Scaling

Vertical scaling means increasing the amount of CPU and Ram that’s used by a single instance of your application.

For example, if we deployed our application to a virtual machine (or an EC2 instance) with 8 Gib of Ram and 1 CPUs, and our application is getting more traffic, we can vertically scale the app by increasing the Ram to 16 Gib and adding one more CPU.

A drawback to this approach is that it has limits. at some point you won’t be able to scale more. That's why we need horizontal scaling as well.

Horizontal Scaling

Horizontal scaling means increasing the number of instances that run your application.

For example, if we deployed our application to a virtual machine (or an EC2 instance), and our application is getting more traffic, we can horizontally scale the app by adding one more instance and use a load balancer to split the traffic between them.

If you are using a cloud provider (like AWS), theoretically, you can add an unlimited number of instances (of course it's going to cost some money).

Why do we need Autoscaling?

Autoscaling means automatically scaling your application, horizontally or vertically, based on a metric(s) like CPU or memory utilization without human intervention.

We need autoscaling because we want to respond to increasing traffic as quickly as possible. We also want to save money and run as few instances with as little resources as possible.

In Kubernetes, We use Vertical Pod Autoscaler (VPA) and Horizontal Pod Autoscaler (HPA) to achieve autoscaling.

Install the metrics server

For Horizontal Pod Autoscaler and Vertical Pod Autoscaler to work, we need to install the metrics server in our cluster. It collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API. Metrics API can also be accessed by kubectl top, making it easier to debug autoscaling pipelines.

To install,

$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

To verify installation,

$ kubectl top pods -n kube-system

This should return all metrics for pods in kube-system namespace.

Vertical Pod Autoscaler

Vertical Pod Autoscaler (VPA) allows you to increase or decrease your pods' resources (RAM and CPU) based on a selected metric. The Vertical Pod Autoscaler ( VPA ) can suggest the Memory/CPU requests and Limits. It can also automatically update the Memory/CPU requests and Limits if this is enabled by the user. This will reduce the time taken by the engineers running the Performance/Benchmark testing to determine the correct values for CPU and memory requests/limits.

VPA doesn't come with kubernetes by default so we need to install it first,

$ git clone https://github.com/kubernetes/autoscaler.git
$ cd autoscaler/vertical-pod-autoscaler/
$ ./hack/vpa-up.sh 
$ kubectl get po -n kube-system | grep -i vpa

NAME READY STATUS RESTARTS AGE
vpa-admission-controller-dklptmn43-44klm 1/1 Running 0 1m11s
vpa-recommender-prllenmca-gjf53 1/1 Running 0 1m50s
vpa-updater-ldee3597h-fje44 1/1 Running 0 1m48s

install VPA

Example

Create a redis deployment and request too much memory and cpu.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-example-deployment
spec:
  template:
    metadata:
      labels:
        app: redis-example-app
    spec:
      containers:
      — name: example-app
        image: redis
        resources:
          limits:
            cpu: 900m
            memory: 4Gi
          requests:
            cpu: 700m
            memory: 2Gi

redis-example-deployment.yaml

Then, we need to run the following command to create the deployment,

$ kubectl create -f redis-example-deployment.yaml

create redis deployment

Next step will be to create the VPA,

apiVersion: autoscaling.k8s.io/v1beta1
kind: VerticalPodAutoscaler
metadata:
  name: redis-example-deployment-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       redis-example-deployment
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
      - containerName: "example-app"
        minAllowed:
          cpu: "100m"
          memory: "50Mi"
        maxAllowed:
          cpu: "600m"
          memory: "500Mi"

redis-example-deployment-vpa.yaml

As you can see, VPA definition file consists of three parts,

targetRef which defines the target of this VPA. It should match the deployment we created earlier.
updatePolicy it tells the VPA how to update the target resource
resourcePolicy, optional. This allow us to be more flexible by defining minimum and maximum resources for a container or to run of autoscaling for a specific container using containerPolicies

VPA update Policy

Here are all valid options for updateMode in VPA:

Off – VPA will only provide the recommendations, then we need to apply them manually if we want to. This is best if we want to use VPA just to give us an idea how much resources our application needs.
Initial – VPA only assigns resource requests on pod creation and never changes them later. It will still provide us with recommendations.
Recreate – VPA assigns resource requests on pod creation time and updates them on existing pods by evicting and recreating them.
Auto – It automatically recreates the pod based on the recommendation. It's best to use PodDisruptionBudget to ensure that the one replica of our deployment is up at all the time without any restarts. This will eventually ensure that our application is available and consistent. For more information please check this.

Horizontal Pod Autoscaler

Horizontal Pod Autoscaling (HPA) that allows you to increase or decrease the number of pods in a deployment automatically based on a selected metric. HPA comes with kubernetes by default.

Example

Create an nginx deployment and make sure you define resources request and limits,

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-nginx-deploy
spec:
  selector:
    matchLabels:
      app: nginx-hpa-deploy
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx-hpa-deploy
    spec:
      containers:
      - name: nginx-container
        image: nginx
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: 400m
            memory: 2Gi
          requests:
            cpu: 100m
            memory: 1Gi

example-nginx-deploy.yaml

Then we need to run the following command to create the deployment,

$ kubectl create -f example-nginx-deploy.yaml

create nginx deployment

Next step will be to create the HPA,

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: example-nginx-hpa
spec:
  maxReplicas: 100
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-nginx-deploy
  targetCPUUtilizationPercentage: 55

example-nginx-hpa.yaml

This HPA will use CPU utilization to scale the deployment. If it's more than 55%, it will scale up. If it's less than 55%, it will scale down. To create the HPA, we need to run the following command:

$ kubectl create -f example-nginx-hpa.yaml

create nginx deployment

There are a lot more configurations we can use to make our HPA more stable and useful. Here's an example with common configuration:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-nginx-deploy
  minReplicas: 1
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      selectPolicy: Min
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

hpa.yaml

In this example, we use CPU and Memory Utilization. There's also the possibility to add more metrics if we want. We also defined scaleUp and scaleDown behaviors to tell kubernetes how we want to do it. For more info please check this.

Custom metrics

In some applications, scaling based on memory or CPU utilization is not that important, probably it does some blocking tasks (live calling external API) which doesn't consume much resources. In this case, scaling based on the number of requests makes more sense.

Since we are using autoscaling/v2 API version, We can configure a HPA to scale based on a custom metric (that is not built in to Kubernetes or any Kubernetes component). The HPA controller then queries for these custom metrics from the Kubernetes API.

Conclusion

Autoscaling is a powerful feature. It allows us to easily adopt our application to handle load change automatically without any human intervention. We can use VerticalPodAutoscaler to help us determine the resources needed for our application. We also can use HPA to add or remove replicas dynamically based on CPU or/and memory utilization. It's also possible to scale based on a custom metric like RPS (number of requests per second) or number of messages in a queue if we use event driven architecture.

Ramadan Khalifa

Berlin