Modern cloud-native applications demand intelligent scaling that goes beyond simple CPU and memory metrics. KEDA (Kubernetes Event-Driven Autoscaling) revolutionizes how we scale workloads by enabling event-driven autoscaling based on external metrics like message queue depth, API response times, and custom application metrics. This comprehensive guide explores production-ready KEDA implementations for two critical use cases: Kafka consumer lag scaling and REST API workload scaling.
Prerequisites
Before implementing KEDA autoscaling, ensure you have:
- Kubernetes cluster (1.24+) with KEDA installed (2.8+)
- Understanding of Kubernetes HPA (Horizontal Pod Autoscaler) concepts
- Basic knowledge of Kafka architecture and consumer groups
- Familiarity with Prometheus metrics collection
- Access to monitoring infrastructure for validation
Estimated implementation time: 3-6 hours including testing and validation.
Understanding KEDA Architecture
KEDA extends Kubernetes autoscaling capabilities by introducing event-driven scaling triggers. Unlike traditional HPA that relies on resource metrics, KEDA can scale based on external events and metrics from various sources.
Core Components
KEDA Operator: Manages ScaledObjects and ScaledJobs, translating external metrics into HPA-compatible metrics.
Metrics Adapter: Exposes external metrics to the Kubernetes metrics server, enabling HPA to consume them.
Admission Webhooks: Validate and mutate KEDA resources during creation and updates.
Scaling Architecture Flow
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Event Source │───▶│ KEDA Scaler │───▶│ Metrics Adapter │
│ (Kafka, API) │ │ (Polling) │ │ (HPA Bridge) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ ScaledObject │───▶│ HPA │
│ (Configuration) │ │ (Scaling) │
└──────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Deployment │
│ (Pod Scaling) │
└─────────────────┘
Kafka Consumer Lag Scaling
Kafka consumer lag represents the difference between the latest message offset and the consumer’s current position. High lag indicates consumers cannot keep up with message production, requiring additional consumer instances.
Understanding Kafka Lag Metrics
Consumer Lag: Number of messages behind the latest offset per partition. Total Lag: Sum of lag across all partitions for a consumer group. Lag Threshold: Target lag per consumer instance to maintain optimal processing.
Basic Kafka Scaler Configuration
# keda-kafka-scaler.yml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
namespace: default
spec:
scaleTargetRef:
name: kafka-consumer-deployment
pollingInterval: 30
cooldownPeriod: 300
minReplicaCount: 2
maxReplicaCount: 50
advanced:
restoreToOriginalReplicaCount: false
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 10
periodSeconds: 60
selectPolicy: Max
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-cluster-kafka-bootstrap:9092
consumerGroup: payment-processor-group
topic: payment-events
lagThreshold: '1000'
offsetResetPolicy: latest
authenticationRef:
name: keda-kafka-auth
---
apiVersion: v1
kind: Secret
metadata:
name: keda-kafka-auth
data:
sasl: "plain"
username: <base64-encoded-username>
password: <base64-encoded-password>
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-kafka-auth
spec:
secretTargetRef:
- parameter: sasl
name: keda-kafka-auth
key: sasl
- parameter: username
name: keda-kafka-auth
key: username
- parameter: password
name: keda-kafka-auth
key: password
Advanced Kafka Scaling Configuration
For production environments, implement more sophisticated scaling logic:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: advanced-kafka-scaler
namespace: production
labels:
app: kafka-consumer
environment: production
spec:
scaleTargetRef:
name: kafka-consumer-deployment
pollingInterval: 15
cooldownPeriod: 600
idleReplicaCount: 0
minReplicaCount: 3
maxReplicaCount: 100
advanced:
restoreToOriginalReplicaCount: false
horizontalPodAutoscalerConfig:
name: kafka-consumer-hpa
behavior:
scaleDown:
stabilizationWindowSeconds: 600
policies:
- type: Percent
value: 25
periodSeconds: 120
- type: Pods
value: 5
periodSeconds: 180
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 200
periodSeconds: 30
- type: Pods
value: 20
periodSeconds: 60
selectPolicy: Max
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-cluster-kafka-bootstrap:9092
consumerGroup: high-throughput-processor
topic: events,notifications,analytics
lagThreshold: '500'
offsetResetPolicy: latest
allowIdleConsumers: 'false'
scaleToZeroOnInvalidOffset: 'true'
excludePersistentLag: 'true'
version: '2.8.0'
authenticationRef:
name: kafka-sasl-auth
- type: prometheus
metadata:
serverAddress: http://prometheus-server:80
metricName: kafka_consumer_processing_rate
threshold: '50'
query: |
sum(rate(kafka_messages_consumed_total{consumer_group="high-throughput-processor"}[5m]))
/
sum(kafka_consumergroup_members{group="high-throughput-processor"})
authenticationRef:
name: prometheus-auth
Kafka Scaling Best Practices
1. Lag Threshold Calculation
Calculate optimal lag threshold based on your processing capacity:
Optimal Lag Threshold = (Messages per Second per Consumer) × (Acceptable Processing Delay in Seconds)
Example: If each consumer processes 100 messages/second and you accept 10-second delays:
Lag Threshold = 100 × 10 = 1000 messages
2. Partition-Aware Scaling
Ensure your scaling strategy accounts for Kafka partition limits:
# Never scale beyond partition count
maxReplicaCount: 32 # Match your topic partition count
minReplicaCount: 2 # Ensure fault tolerance
3. Consumer Group Management
spec:
triggers:
- type: kafka
metadata:
# Use dedicated consumer groups for scaling
consumerGroup: scaling-consumer-group-v2
# Avoid scaling based on multiple topics with different characteristics
topic: payment-events # Single topic for predictable scaling
# Enable idle consumer handling
allowIdleConsumers: 'false'
4. Handling Consumer Rebalancing
Implement graceful shutdown handling in your consumer application:
// Example Go consumer with graceful shutdown
func (c *Consumer) Start(ctx context.Context) error {
consumer := c.createConsumer()
go func() {
<-ctx.Done()
log.Info("Initiating graceful shutdown")
// Allow current message processing to complete
time.Sleep(30 * time.Second)
consumer.Close()
}()
return c.consumeLoop(consumer)
}
REST API Workload Scaling
REST API scaling based on external metrics enables more precise scaling decisions than CPU/memory alone. KEDA supports various HTTP-based scalers including custom metrics from monitoring systems.
Prometheus-Based API Scaling
Scale based on request rate, latency, or custom business metrics:
# keda-rest-api-scaler.yml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: api-request-rate-scaler
namespace: api-production
spec:
scaleTargetRef:
name: rest-api-deployment
pollingInterval: 30
cooldownPeriod: 300
minReplicaCount: 5
maxReplicaCount: 200
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 30
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 30
- type: Pods
value: 25
periodSeconds: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-server:80
metricName: http_requests_per_second_per_pod
threshold: '100'
query: |
sum(rate(http_requests_total{service="api-service",status_code!~"5.."}[2m]))
/
count(up{job="api-service"} == 1)
authenticationRef:
name: prometheus-auth
- type: prometheus
metadata:
serverAddress: http://prometheus-server:80
metricName: http_request_queue_depth
threshold: '50'
query: |
sum(http_requests_in_flight{service="api-service"})
authenticationRef:
name: prometheus-auth
Multi-Metric Scaling Configuration
Combine multiple metrics for sophisticated scaling decisions:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: advanced-api-scaler
namespace: production
spec:
scaleTargetRef:
name: api-service-deployment
pollingInterval: 15
cooldownPeriod: 180
minReplicaCount: 10
maxReplicaCount: 500
triggers:
# Scale based on request rate
- type: prometheus
metadata:
serverAddress: http://prometheus-server:80
metricName: api_request_rate_per_pod
threshold: '200'
query: |
sum(rate(http_requests_total{job="api-service"}[1m]))
/
count(kube_deployment_status_replicas{deployment="api-service-deployment"})
authenticationRef:
name: prometheus-auth
# Scale based on response latency
- type: prometheus
metadata:
serverAddress: http://prometheus-server:80
metricName: api_latency_p95
threshold: '0.5'
query: |
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket{job="api-service"}[2m])) by (le)
)
authenticationRef:
name: prometheus-auth
# Scale based on error rate
- type: prometheus
metadata:
serverAddress: http://prometheus-server:80
metricName: api_error_rate
threshold: '0.02'
query: |
sum(rate(http_requests_total{job="api-service",status_code=~"5.."}[5m]))
/
sum(rate(http_requests_total{job="api-service"}[5m]))
authenticationRef:
name: prometheus-auth
# Scale based on custom business metrics
- type: prometheus
metadata:
serverAddress: http://prometheus-server:80
metricName: pending_orders_per_pod
threshold: '25'
query: |
sum(pending_orders_total{service="api-service"})
/
count(kube_deployment_status_replicas{deployment="api-service-deployment"})
authenticationRef:
name: prometheus-auth
HTTP External Scaler
For APIs that expose their own metrics endpoints:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: http-external-scaler
spec:
scaleTargetRef:
name: payment-api-deployment
triggers:
- type: external
metadata:
scalerAddress: http-scaler.keda:8080
headers: |
Content-Type: application/json
Authorization: Bearer token123
body: |
{
"metric": "pending_payments",
"threshold": 100,
"service": "payment-api"
}
- type: http
metadata:
targetValue: '30'
activationTargetValue: '10'
url: https://api-service.company.com/metrics/queue-depth
method: GET
authenticationRef:
name: http-auth
REST API Scaling Best Practices
1. Choose Appropriate Metrics
Request Rate Scaling:
- Best for: Stateless APIs with predictable processing time
- Threshold calculation:
Target RPS per Pod = (CPU Cores × Efficiency Factor × Requests per Core per Second)
Latency-Based Scaling:
- Best for: APIs with variable processing complexity
- Use P95 or P99 latencies, not averages
- Set thresholds based on SLA requirements
Queue Depth Scaling:
- Best for: Asynchronous processing APIs
- Prevents request queuing and timeout issues
2. Scaling Velocity Configuration
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleUp:
# Aggressive scale-up for traffic spikes
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100 # Double pods quickly
periodSeconds: 30
scaleDown:
# Conservative scale-down to avoid thrashing
stabilizationWindowSeconds: 600
policies:
- type: Percent