Autoscaling Kubernetes Workloads with KEDA and Prometheus

Introduction

Modern cloud-native applications often experience highly variable traffic patterns that make capacity planning challenging. Over-provisioning resources increases infrastructure costs, while under-provisioning can lead to performance degradation and service outages.

This article demonstrates how Kubernetes, KEDA (Kubernetes Event-Driven Autoscaling), and Prometheus can work together to implement metric-driven autoscaling. By leveraging real-time application metrics, KEDA can automatically adjust the number of running pods based on actual demand, helping organizations optimize both performance and resource utilization.

KEDA extends Kubernetes' native autoscaling capabilities by allowing workloads to scale based on external events and custom metrics rather than relying solely on CPU or memory consumption. Prometheus complements this by collecting and exposing application metrics that can be used as scaling triggers.

For this demonstration, we will use Minikube to create a local Kubernetes cluster and Helm to install the required components. Ensure that both minikube and helm are installed before proceeding.

Architecture Overview

The following diagram illustrates the flow of metrics and scaling decisions throughout the system:

Client Traffic
      │
      ▼
  Podinfo Application
      │
      ▼
     /metrics
      │
      ▼
   Prometheus
      │
      ▼
 KEDA ScaledObject
      │
      ▼
 Kubernetes HPA
      │
      ▼
 Scale Pods Up/Down

When application traffic increases, Prometheus records higher request rates. KEDA periodically queries Prometheus, evaluates the configured scaling rules, and dynamically adjusts the replica count of the target deployment through Kubernetes Horizontal Pod Autoscaler (HPA).

Step 1: Create a Kubernetes Cluster

Let's start by creating a local Kubernetes cluster using Minikube:

minikube start

After the command completes successfully, a local Kubernetes cluster will be available and kubectl will be configured automatically to communicate with it.

Step 2: Install KEDA

Follow the official KEDA installation guide for additional deployment options:

https://keda.sh/docs/2.20/deploy/

For this tutorial, we will install KEDA using Helm. Note that we will not use the sample http-app referenced in the KEDA documentation.

helm repo add kedacore https://kedacore.github.io/charts
helm repo update

helm install keda kedacore/keda\
  --namespace keda\
  --create-namespace

After installation, verify that the KEDA components are running:

kubectl get pods -n keda

Expected output:

NAME                                              READY   STATUS
keda-admission-webhooks-58fd99db89-kq449          1/1     Running
keda-operator-894fbfd87-kqqv8                     1/1     Running
keda-operator-metrics-apiserver-558599df4-nkb66   1/1     Running

KEDA installs several Kubernetes Custom Resource Definitions (CRDs), including:

ScaledObject
ScaledJob
TriggerAuthentication
ClusterTriggerAuthentication

Verify the installed CRDs:

kubectl get crds | grep keda

Step 3: Deploy a Sample Application

For this tutorial, we will use the Podinfo application as our sample workload:

https://github.com/stefanprodan/podinfo

Podinfo exposes a /metrics endpoint that Prometheus can scrape, making it an ideal candidate for demonstrating metric-driven autoscaling.

Install the frontend application:

helm repo add podinfo https://stefanprodan.github.io/podinfo

kubectl create namespace test

helm upgrade --install --wait frontend\
  --namespace test\
  --set replicaCount=2\
  --set backend=http://backend-podinfo:9898/echo\
  podinfo/podinfo

Create a port-forward so the application can be accessed on host machine:

kubectl -n test port-forward deploy/frontend-podinfo 8080:9898

Open the application in a browser:

http://localhost:8080

Podinfo app

Install the backend service:

helm upgrade --install --wait backend\
  --namespace test\
  --set redis.enabled=true\
  podinfo/podinfo

Step 4: Install Prometheus

Prometheus will collect metrics from the application and serve as the data source for KEDA scaling decisions.

Create a monitoring namespace and install Prometheus:

kubectl create namespace monitoring

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install prometheus prometheus-community/prometheus\
  --namespace monitoring

Once installed, Prometheus will be accessible internally through:

http://prometheus-server.monitoring.svc.cluster.local:80

This address will later be referenced by the KEDA Prometheus trigger.

To access the Prometheus UI locally:

export POD_NAME=$(kubectl get pods\
  --namespace monitoring\
  -l "app.kubernetes.io/name=prometheus,app.kubernetes.io/instance=prometheus"\
  -o jsonpath="{.items[0].metadata.name}")

kubectl --namespace monitoring port-forward $POD_NAME 9090

Open:

http://localhost:9090

Useful pages:

Query UI: http://localhost:9090/query
Targets UI: http://localhost:9090/targets

The Targets page can be used to view metrics end-points discovered by Prometheus (search pod="frontend-podinfo-" for frontend-podinfo app end-point).

Step 5: Create a KEDA ScaledObject

A ScaledObject is KEDA's primary custom resource for defining autoscaling behavior. It links a Kubernetes workload to one or more event sources or metrics and specifies the conditions under which scaling should occur.

In this example, we will create a ScaledObject that monitors request-rate metrics collected by Prometheus. When the metric exceeds a configured threshold, KEDA will automatically increase the number of application replicas.

Create a file named scaledobject.yaml with the following content:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: podinfo-scaledobject
  namespace: test
spec:
  scaleTargetRef:
    kind: Deployment
    name: frontend-podinfo
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus-server.monitoring.svc.cluster.local:80
        threshold: "5"
        query: sum(rate(http_requests_total[5m]))

The key configuration values are:

Property	Description
`serverAddress`	The Prometheus endpoint that KEDA queries for metrics.
`query`	A PromQL expression that calculates the total request rate across the application.
`threshold`	The metric value that triggers scaling. In this example, KEDA begins scaling when the request rate exceeds 5 requests per second.
`minReplicaCount`	The minimum number of replicas to maintain.
`maxReplicaCount`	The maximum number of replicas KEDA can scale to.

Apply the ScaledObject:

kubectl apply -f scaledobject.yaml

Once created, KEDA automatically creates and manages the corresponding Kubernetes Horizontal Pod Autoscaler (HPA).

Step 6: Generate Load and Verify Autoscaling

In this step, we will generate traffic against the application and observe how KEDA reacts to the increased load.

Generate Load

Use the hey load-testing tool to continuously send requests to the application:

hey -z 5m -c 5 http://localhost:8080

Parameters:

-z 5m — Run the test for 5 minutes.
-c 5 — Use 5 concurrent workers.

Allow the load test to run for a few minutes while monitoring metrics and scaling activity.

Verify Metrics in Prometheus

Open the Prometheus query page:

http://localhost:9090/query

Copy the PromQL query from the ScaledObject configuration:

sum(rate(http_requests_total[5m]))

Paste it into the Expression field and click Execute.

You should see the request rate increase as traffic is generated.

Prometheus stats chart

Verify ScaledObject Status

Open a new terminal and watch the ScaledObject status:

kubectl get scaledobject -n test podinfo-scaledobject -w

Example output:

NAME                   SCALETARGETKIND      SCALETARGETNAME    MIN   MAX   READY   ACTIVE
podinfo-scaledobject   apps/v1.deployment   frontend-podinfo   1     10    True    True

Important status fields:

READY=True indicates that the ScaledObject configuration is valid.
ACTIVE=True indicates that at least one scaling trigger is currently active.
FALLBACK=False indicates that no fallback scaling behavior is being used.

Inspect the ScaledObject

For additional troubleshooting and status information, run:

kubectl describe scaledobject -n test podinfo-scaledobject

Review the following conditions:

Type: Ready
Status: True

Type: Active
Status: True

A Ready=True status indicates that KEDA successfully validated the configuration and can communicate with Prometheus.

An Active=True status indicates that the Prometheus trigger has exceeded the configured threshold and KEDA has initiated scaling actions.

Verify Pod Scaling

As traffic continues to increase, KEDA should create additional replicas of the frontend deployment.

Monitor pod creation using:

kubectl get pods -n test -w

You should observe new frontend-podinfo pods being created as KEDA scales the deployment toward the configured maximum replica count.

Example:

frontend-podinfo-694b577cb7-96kqj
frontend-podinfo-694b577cb7-b8rb2
frontend-podinfo-694b577cb7-d967d
frontend-podinfo-694b577cb7-gtdcz
frontend-podinfo-694b577cb7-ng42w
...

This confirms that KEDA is successfully querying Prometheus metrics and dynamically adjusting application capacity based on real-time demand.

Autoscaling Kubernetes Workloads with KEDA and Prometheus

Introduction

Architecture Overview

Step 1: Create a Kubernetes Cluster

Step 2: Install KEDA

Step 3: Deploy a Sample Application

Step 4: Install Prometheus

Step 5: Create a KEDA ScaledObject

Step 6: Generate Load and Verify Autoscaling

Generate Load

Verify Metrics in Prometheus

Verify ScaledObject Status

Inspect the ScaledObject

Verify Pod Scaling

References

Tags

Author

Stats

Published

You Might Also Like

I Stopped Paying for Idle GPUs - Scale-to-Zero AI Inference on OKE with KEDA