Introduction
Modern cloud-native applications often experience highly variable traffic patterns that make capacity planning challenging. Over-provisioning resources increases infrastructure costs, while under-provisioning can lead to performance degradation and service outages.
This article demonstrates how Kubernetes, KEDA (Kubernetes Event-Driven Autoscaling), and Prometheus can work together to implement metric-driven autoscaling. By leveraging real-time application metrics, KEDA can automatically adjust the number of running pods based on actual demand, helping organizations optimize both performance and resource utilization.
KEDA extends Kubernetes' native autoscaling capabilities by allowing workloads to scale based on external events and custom metrics rather than relying solely on CPU or memory consumption. Prometheus complements this by collecting and exposing application metrics that can be used as scaling triggers.
For this demonstration, we will use Minikube to create a local Kubernetes cluster and Helm to install the required components. Ensure that both minikube and helm are installed before proceeding.
Architecture Overview
The following diagram illustrates the flow of metrics and scaling decisions throughout the system:
Client Traffic
│
▼
Podinfo Application
│
▼
/metrics
│
▼
Prometheus
│
▼
KEDA ScaledObject
│
▼
Kubernetes HPA
│
▼
Scale Pods Up/Down
When application traffic increases, Prometheus records higher request rates. KEDA periodically queries Prometheus, evaluates the configured scaling rules, and dynamically adjusts the replica count of the target deployment through Kubernetes Horizontal Pod Autoscaler (HPA).
Step 1: Create a Kubernetes Cluster
Let's start by creating a local Kubernetes cluster using Minikube:
minikube start
After the command completes successfully, a local Kubernetes cluster will be available and kubectl will be configured automatically to communicate with it.
Step 2: Install KEDA
Follow the official KEDA installation guide for additional deployment options:
https://keda.sh/docs/2.20/deploy/
For this tutorial, we will install KEDA using Helm. Note that we will not use the sample http-app referenced in the KEDA documentation.
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda\
--namespace keda\
--create-namespace
After installation, verify that the KEDA components are running:
kubectl get pods -n keda
Expected output:
NAME READY STATUS
keda-admission-webhooks-58fd99db89-kq449 1/1 Running
keda-operator-894fbfd87-kqqv8 1/1 Running
keda-operator-metrics-apiserver-558599df4-nkb66 1/1 Running
KEDA installs several Kubernetes Custom Resource Definitions (CRDs), including:
ScaledObject
ScaledJob
TriggerAuthentication
ClusterTriggerAuthentication
Verify the installed CRDs:
kubectl get crds | grep keda
Step 3: Deploy a Sample Application
For this tutorial, we will use the Podinfo application as our sample workload:
https://github.com/stefanprodan/podinfo
Podinfo exposes a /metrics endpoint that Prometheus can scrape, making it an ideal candidate for demonstrating metric-driven autoscaling.
Install the frontend application:
helm repo add podinfo https://stefanprodan.github.io/podinfo
kubectl create namespace test
helm upgrade --install --wait frontend\
--namespace test\
--set replicaCount=2\
--set backend=http://backend-podinfo:9898/echo\
podinfo/podinfo
Create a port-forward so the application can be accessed on host machine:
kubectl -n test port-forward deploy/frontend-podinfo 8080:9898
Open the application in a browser:
http://localhost:8080
Install the backend service:
helm upgrade --install --wait backend\
--namespace test\
--set redis.enabled=true\
podinfo/podinfo
Step 4: Install Prometheus
Prometheus will collect metrics from the application and serve as the data source for KEDA scaling decisions.
Create a monitoring namespace and install Prometheus:
kubectl create namespace monitoring
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/prometheus\
--namespace monitoring
Once installed, Prometheus will be accessible internally through:
http://prometheus-server.monitoring.svc.cluster.local:80
This address will later be referenced by the KEDA Prometheus trigger.
To access the Prometheus UI locally:
export POD_NAME=$(kubectl get pods\
--namespace monitoring\
-l "app.kubernetes.io/name=prometheus,app.kubernetes.io/instance=prometheus"\
-o jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 9090
Open:
http://localhost:9090
Useful pages:
- Query UI:
http://localhost:9090/query - Targets UI:
http://localhost:9090/targets
The Targets page can be used to view metrics end-points discovered by Prometheus (search pod="frontend-podinfo-" for frontend-podinfo app end-point).
Step 5: Create a KEDA ScaledObject
A ScaledObject is KEDA's primary custom resource for defining autoscaling behavior. It links a Kubernetes workload to one or more event sources or metrics and specifies the conditions under which scaling should occur.
In this example, we will create a ScaledObject that monitors request-rate metrics collected by Prometheus. When the metric exceeds a configured threshold, KEDA will automatically increase the number of application replicas.
Create a file named scaledobject.yaml with the following content:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: podinfo-scaledobject
namespace: test
spec:
scaleTargetRef:
kind: Deployment
name: frontend-podinfo
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-server.monitoring.svc.cluster.local:80
threshold: "5"
query: sum(rate(http_requests_total[5m]))
The key configuration values are:
| Property | Description |
|---|---|
serverAddress |
The Prometheus endpoint that KEDA queries for metrics. |
query |
A PromQL expression that calculates the total request rate across the application. |
threshold |
The metric value that triggers scaling. In this example, KEDA begins scaling when the request rate exceeds 5 requests per second. |
minReplicaCount |
The minimum number of replicas to maintain. |
maxReplicaCount |
The maximum number of replicas KEDA can scale to. |
Apply the ScaledObject:
kubectl apply -f scaledobject.yaml
Once created, KEDA automatically creates and manages the corresponding Kubernetes Horizontal Pod Autoscaler (HPA).
Step 6: Generate Load and Verify Autoscaling
In this step, we will generate traffic against the application and observe how KEDA reacts to the increased load.
Generate Load
Use the hey load-testing tool to continuously send requests to the application:
hey -z 5m -c 5 http://localhost:8080
Parameters:
-
-z 5m— Run the test for 5 minutes. -
-c 5— Use 5 concurrent workers.
Allow the load test to run for a few minutes while monitoring metrics and scaling activity.
Verify Metrics in Prometheus
Open the Prometheus query page:
http://localhost:9090/query
Copy the PromQL query from the ScaledObject configuration:
sum(rate(http_requests_total[5m]))
Paste it into the Expression field and click Execute.
You should see the request rate increase as traffic is generated.
Verify ScaledObject Status
Open a new terminal and watch the ScaledObject status:
kubectl get scaledobject -n test podinfo-scaledobject -w
Example output:
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX READY ACTIVE
podinfo-scaledobject apps/v1.deployment frontend-podinfo 1 10 True True
Important status fields:
-
READY=Trueindicates that the ScaledObject configuration is valid. -
ACTIVE=Trueindicates that at least one scaling trigger is currently active. -
FALLBACK=Falseindicates that no fallback scaling behavior is being used.
Inspect the ScaledObject
For additional troubleshooting and status information, run:
kubectl describe scaledobject -n test podinfo-scaledobject
Review the following conditions:
Type: Ready
Status: True
Type: Active
Status: True
A Ready=True status indicates that KEDA successfully validated the configuration and can communicate with Prometheus.
An Active=True status indicates that the Prometheus trigger has exceeded the configured threshold and KEDA has initiated scaling actions.
Verify Pod Scaling
As traffic continues to increase, KEDA should create additional replicas of the frontend deployment.
Monitor pod creation using:
kubectl get pods -n test -w
You should observe new frontend-podinfo pods being created as KEDA scales the deployment toward the configured maximum replica count.
Example:
frontend-podinfo-694b577cb7-96kqj
frontend-podinfo-694b577cb7-b8rb2
frontend-podinfo-694b577cb7-d967d
frontend-podinfo-694b577cb7-gtdcz
frontend-podinfo-694b577cb7-ng42w
...
This confirms that KEDA is successfully querying Prometheus metrics and dynamically adjusting application capacity based on real-time demand.





