Kubernetes Monitoring & Scaling — HPA, Metrics, Resource Management

Why Monitoring and Scaling Matter

Without monitoring, you are flying blind in production. Without autoscaling, you either over-provision (wasting money) or under-provision (causing outages). Kubernetes provides built-in mechanisms for both — metrics collection via Metrics Server, Horizontal Pod Autoscaling (HPA), and resource quotas to ensure fair usage.

Why this matters for your career:

Monitoring is the foundation of production readiness
Autoscaling saves significant cloud costs (30-60%)
HPA is a standard interview topic for DevOps roles
Understanding resource management prevents noisy-neighbor problems

Metrics Server

Metrics Server collects resource metrics (CPU, memory) from Kubernetes nodes and pods:

# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# View node metrics
kubectl top nodes
# Output:
# NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
# node-1     450m         22%    2048Mi          65%
# node-2     320m         16%    1536Mi          49%

# View pod metrics
kubectl top pods -n my-app
# Output:
# NAME                     CPU(cores)   MEMORY(bytes)
# my-app-6d4b8f7c9-abc12   120m         256Mi
# my-app-6d4b8f7c9-def34   98m          212Mi

Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pods based on CPU, memory, or custom metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
  namespace: my-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

How HPA Works

Metrics Server collects CPU/memory usage every 15 seconds
HPA controller calculates desired replicas: desired = current * (current_utilization / target_utilization)
If CPU is at 90% and target is 70%: desired = 3 * (90/70) = ~4 replicas
HPA updates the Deployment's replica count
Deployment creates or terminates pods accordingly

Autoscaling Behavior

| Metric | Target | Current | Calculation | Desired Replicas | |--------|--------|---------|-------------|-----------------| | CPU | 70% | 140% | 3 * (140/70) | 6 | | CPU | 70% | 35% | 3 * (35/70) | 1.5 → 2 (min) | | CPU | 70% | 210% | 3 * (210/70) | 9 |

Create HPA from Command Line

# Simple CPU-based HPA
kubectl autoscale deployment my-app -n my-app \
  --min=2 --max=10 --cpu-percent=70

# View HPA status
kubectl get hpa -n my-app -w
# Output:
# NAME      REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
# my-app-hpa   Deployment/my-app   45%/70%   2         10        3          5m

Resource Quotas

Resource quotas limit total resource consumption per namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-team-quota
  namespace: dev
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"
    services: "10"
    persistentvolumeclaims: "5"

Limit Ranges

Limit ranges set default and min/max resource requests per pod or container:

apiVersion: v1
kind: LimitRange
metadata:
  name: dev-limits
  namespace: dev
spec:
  limits:
  - max:
      cpu: "2"
      memory: "2Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
    default:
      cpu: "200m"
      memory: "256Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    type: Container

Prometheus and Grafana

For production monitoring, deploy Prometheus and Grafana:

# Using Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install kube-prometheus-stack (includes Prometheus, Grafana, AlertManager)
helm install monitoring prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace

# Access Grafana
kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring
# Open http://localhost:3000 (admin/prom-operator)

Grafana dashboards provide:

Cluster resource utilization (CPU, memory, disk, network)
Pod resource usage per namespace
API server latency and error rates
Node health and capacity
Custom application metrics

Cluster Autoscaler

While HPA scales pods, Cluster Autoscaler scales nodes:

# On AWS EKS
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

Cluster Autoscaler adds nodes when pending pods cannot be scheduled due to insufficient resources, and removes nodes when they are underutilized.

Monitoring Best Practices

| Practice | Description | |----------|-------------| | Set resource requests and limits | Every container must specify both | | Configure liveness probes | Restarts unhealthy containers automatically | | Configure readiness probes | Prevents traffic to unready pods | | Set up pod disruption budgets | Ensures minimum available pods during maintenance | | Monitor the 4 golden signals | Latency, traffic, errors, saturation | | Set up alerts for critical metrics | PagerDuty, Slack, or email integration | | Use namespace quotas | Prevents one team from starving others | | Log centralization | Send logs to Elasticsearch, Loki, or CloudWatch |

Summary

Monitoring and autoscaling are essential for running Kubernetes in production. Metrics Server provides basic resource metrics, HPA automatically scales pods based on load, and resource quotas prevent resource abuse. Prometheus and Grafana provide enterprise-grade monitoring.

Key takeaways:

Metrics Server collects CPU/memory usage from nodes and pods
HPA scales pods automatically based on target utilization
HPA formula: desired = current * (current_utilization / target_utilization)
Resource quotas limit total consumption per namespace
Limit ranges set default and min/max resource requests
Prometheus + Grafana provide production monitoring dashboards
Cluster Autoscaler adds/removes nodes based on pod scheduling needs
Always set resource requests and limits on every container

What's Next: CI/CD Pipeline

The next chapter covers CI/CD pipelines for containerized applications — automating builds, tests, and deployments with GitHub Actions and ArgoCD.