VictoriaMetrics 完全指南 / 13 - 容器化部署
13 · 容器化部署
本章目标
- 掌握 Docker 单容器和 Compose 部署
- 了解 Kubernetes 上的多种部署方式
- 使用 Helm Chart 快速搭建集群
- 掌握生产级容器化配置
13.1 Docker 基础
13.1.1 镜像说明
| 镜像 | 说明 |
|---|
victoriametrics/victoria-metrics | 单节点版 |
victoriametrics/vminsert | 集群写入层 |
victoriametrics/vmselect | 集群查询层 |
victoriametrics/vmstorage | 集群存储层 |
victoriametrics/vmagent | 采集代理 |
victoriametrics/vmalert | 告警引擎 |
victoriametrics/vmauth | 认证代理 |
13.1.2 基础运行
# 单节点
docker run -d \
--name victoria-metrics \
-p 8428:8428 \
-v vm-data:/victoria-metrics-data \
victoriametrics/victoria-metrics:v1.106.0 \
-storageDataPath=/victoria-metrics-data \
-retentionPeriod=90d
# 验证
curl http://localhost:8428/health
13.1.3 环境变量配置
# 使用环境变量传递参数(需开启 envflag)
docker run -d \
--name victoria-metrics \
-p 8428:8428 \
-v vm-data:/victoria-metrics-data \
-e VM_STORAGE_DATA_PATH=/victoria-metrics-data \
-e VM_RETENTION_PERIOD=90d \
-e VM_HTTP_LISTEN_ADDR=:8428 \
-e VM_MEMORY_ALLOWED_PERCENT=60 \
-e VM_ENVFLAG_ENABLE=true \
victoriametrics/victoria-metrics:v1.106.0
13.2 Docker Compose 完整监控栈
13.2.1 项目结构
vm-monitoring/
├── docker-compose.yml
├── .env
├── config/
│ ├── prometheus.yml
│ ├── alertmanager.yml
│ ├── vmalert-rules.yml
│ └── vmauth.yml
├── grafana/
│ └── provisioning/
│ ├── datasources/
│ │ └── victoriametrics.yml
│ └── dashboards/
│ └── dashboard.yml
└── data/ # 持久化数据
├── vm-data/
├── grafana-data/
└── alertmanager-data/
13.2.2 环境变量
# .env
VM_VERSION=v1.106.0
GRAFANA_VERSION=11.0.0
PROMETHEUS_VERSION=v2.50.0
# Grafana 管理员密码
GF_SECURITY_ADMIN_PASSWORD=admin123
# 数据保留期
VM_RETENTION_PERIOD=90d
# 资源限制
VM_MEMORY_LIMIT=4G
VM_CPU_LIMIT=2
13.2.3 docker-compose.yml
version: '3.8'
services:
# VictoriaMetrics 单节点
victoria-metrics:
image: victoriametrics/victoria-metrics:${VM_VERSION}
container_name: victoria-metrics
restart: unless-stopped
ports:
- "8428:8428"
volumes:
- ./data/vm-data:/victoria-metrics-data
command:
- '-storageDataPath=/victoria-metrics-data'
- '-retentionPeriod=${VM_RETENTION_PERIOD}'
- '-httpListenAddr=:8428'
- '-memory.allowedPercent=60'
- '-dedup.minScrapeInterval=15s'
- '-envflag.enable=true'
deploy:
resources:
limits:
memory: ${VM_MEMORY_LIMIT}
cpus: '${VM_CPU_LIMIT}'
healthcheck:
test: ["CMD", "/usr/bin/wget", "--spider", "-q", "http://localhost:8428/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
networks:
- monitoring
# vmagent - 轻量采集
vmagent:
image: victoriametrics/vmagent:${VM_VERSION}
container_name: vmagent
restart: unless-stopped
ports:
- "8429:8429"
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./data/vmagent-data:/vmagent-remotewrite-data
command:
- '-promscrape.config=/etc/prometheus/prometheus.yml'
- '-remoteWrite.url=http://victoria-metrics:8428/api/v1/write'
- '-remoteWrite.tmpDataPath=/vmagent-remotewrite-data'
- '-envflag.enable=true'
depends_on:
victoria-metrics:
condition: service_healthy
networks:
- monitoring
# vmalert - 告警引擎
vmalert:
image: victoriametrics/vmalert:${VM_VERSION}
container_name: vmalert
restart: unless-stopped
ports:
- "8880:8880"
volumes:
- ./config/vmalert-rules.yml:/etc/vmalert/rules.yml:ro
command:
- '-rule=/etc/vmalert/rules.yml'
- '-datasource.url=http://victoria-metrics:8428'
- '-notifier.url=http://alertmanager:9093'
- '-external.label=env=prod'
- '-evaluationInterval=30s'
- '-httpListenAddr=:8880'
- '-envflag.enable=true'
depends_on:
victoria-metrics:
condition: service_healthy
networks:
- monitoring
# Alertmanager
alertmanager:
image: prom/alertmanager:v0.27.0
container_name: alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./config/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
- ./data/alertmanager-data:/alertmanager
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
networks:
- monitoring
# Grafana
grafana:
image: grafana/grafana:${GRAFANA_VERSION}
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- ./data/grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD}
- GF_INSTALL_PLUGINS=victoriametrics-metrics-datasource
- GF_USERS_ALLOW_SIGN_UP=false
depends_on:
victoria-metrics:
condition: service_healthy
networks:
- monitoring
# vmauth - 认证网关(可选)
vmauth:
image: victoriametrics/vmauth:${VM_VERSION}
container_name: vmauth
restart: unless-stopped
ports:
- "8427:8427"
volumes:
- ./config/vmauth.yml:/etc/vmauth/auth.yml:ro
command:
- '-auth.config=/etc/vmauth/auth.yml'
- '-httpListenAddr=:8427'
- '-envflag.enable=true'
depends_on:
victoria-metrics:
condition: service_healthy
networks:
- monitoring
networks:
monitoring:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16
13.2.4 Prometheus 采集配置
# config/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'victoria-metrics'
static_configs:
- targets: ['victoria-metrics:8428']
- job_name: 'vmagent'
static_configs:
- targets: ['vmagent:8429']
- job_name: 'vmalert'
static_configs:
- targets: ['vmalert:8880']
- job_name: 'alertmanager'
static_configs:
- targets: ['alertmanager:9093']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
13.2.5 启动与管理
# 启动所有服务
docker compose up -d
# 查看状态
docker compose ps
# 查看日志
docker compose logs -f victoria-metrics
docker compose logs -f vmagent
# 停止所有服务
docker compose down
# 停止并删除数据卷
docker compose down -v
# 重建某一个服务
docker compose up -d --force-recreate victoria-metrics
13.3 Kubernetes 部署
13.3.1 使用 Helm Chart(推荐)
# 添加 Helm 仓库
helm repo add vm https://victoriametrics.github.io/helm-charts/
helm repo update
# 搜索可用 chart
helm search repo vm/
13.3.2 单节点版 Helm 安装
# 安装单节点版
helm install victoria-metrics vm/victoria-metrics \
-n monitoring \
--create-namespace \
--set server.retentionPeriod=90d \
--set server.resources.limits.memory=4Gi \
--set server.resources.limits.cpu=2 \
--set server.persistentVolume.size=50Gi \
--set server.scrape.enabled=true
13.3.3 集群版 Helm 安装
# 安装集群版
helm install victoria-metrics-cluster vm/victoria-metrics-cluster \
-n monitoring \
--create-namespace \
--set vminsert.replicaCount=2 \
--set vmselect.replicaCount=2 \
--set vmstorage.replicaCount=3 \
--set vmstorage.retentionPeriod=90d \
--set vmstorage.persistentVolume.size=100Gi \
--set vmstorage.resources.limits.memory=16Gi \
--set vmstorage.resources.limits.cpu=4
13.3.4 自定义 values.yaml
# values-cluster.yaml
vminsert:
replicaCount: 2
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 4Gi
ingress:
enabled: true
className: nginx
hosts:
- vminsert.example.com
extraArgs:
maxConcurrentInserts: "64"
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- vminsert
topologyKey: kubernetes.io/hostname
vmselect:
replicaCount: 2
resources:
requests:
cpu: 1
memory: 2Gi
limits:
cpu: "4"
memory: 8Gi
ingress:
enabled: true
className: nginx
hosts:
- vmselect.example.com
extraArgs:
search.maxConcurrentRequests: "16"
search.maxQueryDuration: "30s"
cacheMountPath: /cache
persistentVolume:
size: 10Gi
vmstorage:
replicaCount: 3
retentionPeriod: 90d
resources:
requests:
cpu: "2"
memory: 8Gi
limits:
cpu: "8"
memory: 32Gi
persistentVolume:
storageClass: gp3
size: 100Gi
extraArgs:
dedup.minScrapeInterval: "15s"
memory.allowedPercent: "60"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- vmstorage
topologyKey: kubernetes.io/hostname
# vmagent
vmagent:
enabled: true
replicaCount: 1
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: "1"
memory: 1Gi
extraArgs:
promscrape.maxScrapeSize: "64MB"
# vmalert
vmalert:
enabled: true
replicaCount: 1
extraArgs:
evaluationInterval: "30s"
# 使用自定义配置安装
helm install victoria-metrics-cluster vm/victoria-metrics-cluster \
-n monitoring \
-f values-cluster.yaml
13.3.5 升级与回滚
# 升级
helm upgrade victoria-metrics-cluster vm/victoria-metrics-cluster \
-n monitoring \
-f values-cluster.yaml \
--set vmstorage.replicaCount=5
# 查看历史
helm history victoria-metrics-cluster -n monitoring
# 回滚到版本 1
helm rollback victoria-metrics-cluster 1 -n monitoring
13.4 Kubernetes 原生 YAML 部署
13.4.1 Namespace
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
labels:
app.kubernetes.io/part-of: victoria-metrics
13.4.2 vmstorage StatefulSet
# vmstorage.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: vmstorage
namespace: monitoring
spec:
serviceName: vmstorage
replicas: 3
selector:
matchLabels:
app: vmstorage
template:
metadata:
labels:
app: vmstorage
app.kubernetes.io/component: vmstorage
spec:
containers:
- name: vmstorage
image: victoriametrics/vmstorage:v1.106.0
ports:
- containerPort: 8482
name: http
- containerPort: 8400
name: insert
- containerPort: 8401
name: select
args:
- '-storageDataPath=/storage'
- '-retentionPeriod=90d'
- '-vminsertAddr=:8400'
- '-vmselectAddr=:8401'
- '-httpListenAddr=:8482'
- '-memory.allowedPercent=60'
- '-dedup.minScrapeInterval=15s'
resources:
requests:
cpu: "2"
memory: 8Gi
limits:
cpu: "8"
memory: 32Gi
volumeMounts:
- name: vmstorage-data
mountPath: /storage
livenessProbe:
httpGet:
path: /health
port: 8482
initialDelaySeconds: 15
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: 8482
initialDelaySeconds: 5
periodSeconds: 10
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- vmstorage
topologyKey: kubernetes.io/hostname
volumeClaimTemplates:
- metadata:
name: vmstorage-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: gp3
resources:
requests:
storage: 100Gi
---
apiVersion: v1
kind: Service
metadata:
name: vmstorage
namespace: monitoring
spec:
clusterIP: None
ports:
- port: 8482
name: http
- port: 8400
name: insert
- port: 8401
name: select
selector:
app: vmstorage
13.4.3 vminsert Deployment
# vminsert.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: vminsert
namespace: monitoring
spec:
replicas: 2
selector:
matchLabels:
app: vminsert
template:
metadata:
labels:
app: vminsert
app.kubernetes.io/component: vminsert
spec:
containers:
- name: vminsert
image: victoriametrics/vminsert:v1.106.0
ports:
- containerPort: 8480
name: http
args:
- '-httpListenAddr=:8480'
- '-storageNode=vmstorage-0.vmstorage:8400,vmstorage-1.vmstorage:8400,vmstorage-2.vmstorage:8400'
- '-replicationFactor=2'
- '-dedup.minScrapeInterval=15s'
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 4Gi
livenessProbe:
httpGet:
path: /health
port: 8480
initialDelaySeconds: 15
periodSeconds: 30
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- vminsert
topologyKey: kubernetes.io/hostname
---
apiVersion: v1
kind: Service
metadata:
name: vminsert
namespace: monitoring
spec:
type: ClusterIP
ports:
- port: 8480
name: http
selector:
app: vminsert
13.5 容器最佳实践
13.5.1 资源限制
# 推荐的资源配置
resources:
requests:
cpu: "2" # 请求值
memory: 8Gi
limits:
cpu: "4" # 限制值(建议为 request 的 2 倍)
memory: 16Gi # 限制值(与 request 一致避免 OOM Kill)
13.5.2 存储
# 推荐使用 SSD 存储类
volumeClaimTemplates:
- metadata:
name: data
spec:
storageClassName: gp3 # AWS SSD
# storageClassName: premium-rwo # GCP SSD
# storageClassName: managed-premium # Azure SSD
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
13.5.3 优雅关闭
# 优雅关闭配置
spec:
terminationGracePeriodSeconds: 60
containers:
- name: victoria-metrics
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
本章小结
| 部署方式 | 适用场景 | 复杂度 |
|---|
| Docker 单容器 | 开发测试 | 低 |
| Docker Compose | 小规模生产 | 中 |
| Helm Chart(单节点) | 中小规模 | 中 |
| Helm Chart(集群) | 大规模生产 | 中 |
| 原生 YAML | 定制化需求 | 高 |
扩展阅读