Prometheus 完全指南 / 15 - 容器化部署
15 - 容器化部署
15.1 概述
本章介绍如何在 Docker、Docker Compose 和 Kubernetes 环境中部署 Prometheus 监控栈。
15.2 Docker 部署
单容器部署
# 创建配置目录
mkdir -p /opt/prometheus/{config,data,rules}
# 编写配置
cat > /opt/prometheus/config/prometheus.yml <<EOF
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
EOF
# 运行容器
docker run -d \
--name=prometheus \
--restart=unless-stopped \
-p 9090:9090 \
-v /opt/prometheus/config:/etc/prometheus:ro \
-v /opt/prometheus/data:/prometheus \
-v /opt/prometheus/rules:/etc/prometheus/rules:ro \
prom/prometheus:v2.52.0 \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.retention.time=30d \
--web.enable-lifecycle
15.3 Docker Compose 全栈部署
完整监控栈
# docker-compose.yml
version: '3.8'
services:
# Prometheus
prometheus:
image: prom/prometheus:v2.52.0
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/rules:/etc/prometheus/rules:ro
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
- '--storage.tsdb.retention.size=10GB'
- '--web.enable-lifecycle'
- '--web.enable-admin-api'
networks:
- monitoring
# Alertmanager
alertmanager:
image: prom/alertmanager:v0.27.0
container_name: alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
- alertmanager_data:/alertmanager
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
networks:
- monitoring
# Grafana
grafana:
image: grafana/grafana:10.4.0
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
networks:
- monitoring
# Node Exporter
node-exporter:
image: prom/node-exporter:v1.7.0
container_name: node-exporter
restart: unless-stopped
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
networks:
- monitoring
# cAdvisor (容器指标)
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.2
container_name: cadvisor
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
privileged: true
devices:
- /dev/kmsg
networks:
- monitoring
# Blackbox Exporter
blackbox-exporter:
image: prom/blackbox-exporter:v0.24.0
container_name: blackbox-exporter
restart: unless-stopped
ports:
- "9115:9115"
volumes:
- ./blackbox/config.yml:/etc/blackbox_exporter/config.yml:ro
networks:
- monitoring
# Pushgateway
pushgateway:
image: prom/pushgateway:v1.7.0
container_name: pushgateway
restart: unless-stopped
ports:
- "9091:9091"
networks:
- monitoring
volumes:
prometheus_data:
alertmanager_data:
grafana_data:
networks:
monitoring:
driver: bridge
Prometheus 配置
# prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- /etc/prometheus/rules/*.yml
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'pushgateway'
honor_labels: true
static_configs:
- targets: ['pushgateway:9091']
- job_name: 'blackbox-http'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
启动
docker compose up -d
docker compose ps
docker compose logs -f prometheus
15.4 Kubernetes 部署
使用 Helm
# 添加 Helm 仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# 安装 kube-prometheus-stack(包含 Prometheus + Grafana + Alertmanager + Exporters)
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=gp3 \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
--set grafana.adminPassword=admin123
自定义 Helm values
# values.yaml
prometheus:
prometheusSpec:
retention: 30d
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: "2"
memory: 4Gi
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: gp3
resources:
requests:
storage: 100Gi
additionalScrapeConfigs:
- job_name: 'custom-app'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
alertmanager:
alertmanagerSpec:
resources:
requests:
cpu: 100m
memory: 256Mi
storage:
volumeClaimTemplate:
spec:
storageClassName: gp3
resources:
requests:
storage: 10Gi
config:
global:
resolve_timeout: 5m
route:
receiver: 'null'
routes:
- match:
severity: critical
receiver: 'critical'
receivers:
- name: 'null'
- name: 'critical'
webhook_configs:
- url: 'http://webhook:8080/alert'
grafana:
adminPassword: admin123
persistence:
enabled: true
size: 10Gi
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'custom'
folder: 'Custom'
type: file
options:
path: /var/lib/grafana/dashboards/custom
dashboards:
custom:
node-exporter:
gnetId: 1860
revision: 33
datasource: Prometheus
# 安装
helm install monitoring prometheus-community/kube-prometheus-stack \
-n monitoring \
-f values.yaml
# 升级
helm upgrade monitoring prometheus-community/kube-prometheus-stack \
-n monitoring \
-f values.yaml
手动部署(YAML)
# namespace.yml
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
# prometheus-configmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
rule_files:
- /etc/prometheus/rules/*.yml
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
---
# prometheus-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: prom/prometheus:v2.52.0
ports:
- containerPort: 9090
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
volumeMounts:
- name: config
mountPath: /etc/prometheus
- name: data
mountPath: /prometheus
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 4Gi
volumes:
- name: config
configMap:
name: prometheus-config
- name: data
persistentVolumeClaim:
claimName: prometheus-data
---
# prometheus-rbac.yml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions", "networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
---
# prometheus-service.yml
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
spec:
type: ClusterIP
selector:
app: prometheus
ports:
- port: 9090
targetPort: 9090
15.5 健康检查与就绪探针
Kubernetes 探针
containers:
- name: prometheus
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
periodSeconds: 15
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 10
Docker 健康检查
# docker-compose.yml
services:
prometheus:
image: prom/prometheus:v2.52.0
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:9090/-/healthy"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
15.6 持久化存储
Docker Volume
volumes:
prometheus_data:
driver: local
driver_opts:
type: none
o: bind
device: /opt/data/prometheus
Kubernetes PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data
namespace: monitoring
spec:
storageClassName: gp3
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
15.7 本章小结
| 部署方式 | 适用场景 | 复杂度 |
|---|---|---|
| Docker | 单机测试 | 低 |
| Docker Compose | 多组件编排 | 中 |
| Helm (K8s) | 生产环境 | 中 |
| 手动 YAML (K8s) | 自定义需求 | 高 |
扩展阅读
上一章:14 - Thanos 下一章:16 - Grafana 集成