Prometheus 指標收集
Vibe Prompt
「幫我寫 Prometheus 設定檔:監控 K8s 叢集的節點、Pod、Deployment,每 15 秒抓一次。」
prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
關鍵指標
# CPU 使用率
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# 記憶體使用率
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# Pod 重啟次數
rate(kube_pod_container_status_restarts_total[5m])
# 磁碟空間
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100