Kubernetes Monitoring & Scaling — HPA, Metrics, Resource Management
Why Monitoring and Scaling Matter
Without monitoring, you are flying blind in production. Without autoscaling, you either over-provision (wasting money) or under-provision (causing outages). Kubernetes provides built-in mechanisms for both — metrics collection via Metrics Server, Horizontal Pod Autoscaling (HPA), and resource quotas to ensure fair usage.
Why this matters for your career:
- Monitoring is the foundation of production readiness
- Autoscaling saves significant cloud costs (30-60%)
- HPA is a standard interview topic for DevOps roles
- Understanding resource management prevents noisy-neighbor problems
Metrics Server
Metrics Server collects resource metrics (CPU, memory) from Kubernetes nodes and pods:
# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# View node metrics
kubectl top nodes
# Output:
# NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
# node-1 450m 22% 2048Mi 65%
# node-2 320m 16% 1536Mi 49%
# View pod metrics
kubectl top pods -n my-app
# Output:
# NAME CPU(cores) MEMORY(bytes)
# my-app-6d4b8f7c9-abc12 120m 256Mi
# my-app-6d4b8f7c9-def34 98m 212Mi
Horizontal Pod Autoscaler (HPA)
HPA automatically scales the number of pods based on CPU, memory, or custom metrics.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
namespace: my-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
How HPA Works
- Metrics Server collects CPU/memory usage every 15 seconds
- HPA controller calculates desired replicas:
desired = current * (current_utilization / target_utilization) - If CPU is at 90% and target is 70%:
desired = 3 * (90/70) = ~4 replicas - HPA updates the Deployment's replica count
- Deployment creates or terminates pods accordingly
Autoscaling Behavior
| Metric | Target | Current | Calculation | Desired Replicas | |--------|--------|---------|-------------|-----------------| | CPU | 70% | 140% | 3 * (140/70) | 6 | | CPU | 70% | 35% | 3 * (35/70) | 1.5 → 2 (min) | | CPU | 70% | 210% | 3 * (210/70) | 9 |
Create HPA from Command Line
# Simple CPU-based HPA
kubectl autoscale deployment my-app -n my-app \
--min=2 --max=10 --cpu-percent=70
# View HPA status
kubectl get hpa -n my-app -w
# Output:
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# my-app-hpa Deployment/my-app 45%/70% 2 10 3 5m
Resource Quotas
Resource quotas limit total resource consumption per namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-team-quota
namespace: dev
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"
services: "10"
persistentvolumeclaims: "5"
Limit Ranges
Limit ranges set default and min/max resource requests per pod or container:
apiVersion: v1
kind: LimitRange
metadata:
name: dev-limits
namespace: dev
spec:
limits:
- max:
cpu: "2"
memory: "2Gi"
min:
cpu: "50m"
memory: "64Mi"
default:
cpu: "200m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
type: Container
Prometheus and Grafana
For production monitoring, deploy Prometheus and Grafana:
# Using Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install kube-prometheus-stack (includes Prometheus, Grafana, AlertManager)
helm install monitoring prometheus-community/kube-prometheus-stack \
-n monitoring --create-namespace
# Access Grafana
kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring
# Open http://localhost:3000 (admin/prom-operator)
Grafana dashboards provide:
- Cluster resource utilization (CPU, memory, disk, network)
- Pod resource usage per namespace
- API server latency and error rates
- Node health and capacity
- Custom application metrics
Cluster Autoscaler
While HPA scales pods, Cluster Autoscaler scales nodes:
# On AWS EKS
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
Cluster Autoscaler adds nodes when pending pods cannot be scheduled due to insufficient resources, and removes nodes when they are underutilized.
Monitoring Best Practices
| Practice | Description | |----------|-------------| | Set resource requests and limits | Every container must specify both | | Configure liveness probes | Restarts unhealthy containers automatically | | Configure readiness probes | Prevents traffic to unready pods | | Set up pod disruption budgets | Ensures minimum available pods during maintenance | | Monitor the 4 golden signals | Latency, traffic, errors, saturation | | Set up alerts for critical metrics | PagerDuty, Slack, or email integration | | Use namespace quotas | Prevents one team from starving others | | Log centralization | Send logs to Elasticsearch, Loki, or CloudWatch |
Summary
Monitoring and autoscaling are essential for running Kubernetes in production. Metrics Server provides basic resource metrics, HPA automatically scales pods based on load, and resource quotas prevent resource abuse. Prometheus and Grafana provide enterprise-grade monitoring.
Key takeaways:
- Metrics Server collects CPU/memory usage from nodes and pods
- HPA scales pods automatically based on target utilization
- HPA formula: desired = current * (current_utilization / target_utilization)
- Resource quotas limit total consumption per namespace
- Limit ranges set default and min/max resource requests
- Prometheus + Grafana provide production monitoring dashboards
- Cluster Autoscaler adds/removes nodes based on pod scheduling needs
- Always set resource requests and limits on every container
What's Next: CI/CD Pipeline
The next chapter covers CI/CD pipelines for containerized applications — automating builds, tests, and deployments with GitHub Actions and ArgoCD.