Squid 完全指南 / 13 - 监控与可视化
第十三章:监控与可视化
13.1 监控概述
有效的监控是保障 Squid 稳定运行的关键。本章介绍多种监控方案,从内置工具到企业级监控栈。
┌──────────────────────────────────────────────────────┐
│ 监控架构 │
│ │
│ ┌───────────┐ ┌───────────────┐ ┌──────────────┐ │
│ │ Squid │ │ Cache Mgr │ │ Prometheus │ │
│ │ Server │──│ (内置) │──│ Exporter │ │
│ └───────────┘ └───────────────┘ └──────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌───────────┐ ┌───────────────┐ ┌──────────────┐ │
│ │ 日志文件 │ │ SNMP │ │ Prometheus │ │
│ │ access │ │ (协议监控) │ │ Server │ │
│ └───────────┘ └───────────────┘ └──────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Grafana 可视化 │ │
│ │ Dashboard: 命中率 | 流量 | 延迟 | 错误 | 容量 │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
13.2 Cache Manager (cachemgr)
Cache Manager 是 Squid 内置的管理接口,提供丰富的运行时信息。
13.2.1 启用 Cache Manager
# Cache Manager ACL
acl manager proto cache_object
acl localhost src 127.0.0.1/32
acl management src 192.168.1.0/24
# 允许访问
http_access allow manager localhost
http_access allow manager management
http_access deny manager
# 可选:设置密码保护
# cachemgr_passwd secret_password all
13.2.2 使用 squidclient 查询
# 基本信息
squidclient -h localhost mgr:info
# 5 分钟统计
squidclient -h localhost mgr:5min
# 内存使用
squidclient -h localhost mgr:mem
# 缓存目录
squidclient -h localhost mgr:storedir
# 活跃请求
squidclient -h localhost mgr:active_requests
# 客户端列表
squidclient -h localhost mgr:client_list
# 对等体状态
squidclient -h localhost mgr:peer_list
# IP 缓存
squidclient -h localhost mgr:ipcache
# FQDN 缓存
squidclient -h localhost mgr:fqdncache
# 延迟池统计
squidclient -h localhost mgr:delay
# 带密码的查询
squidclient -h localhost -U cachemgr_passwd mgr:info
13.2.3 通过 HTTP 访问
# 直接通过浏览器或 curl 访问
curl "http://localhost:3128/squid-internal-mgr/info"
curl "http://localhost:3128/squid-internal-mgr/5min"
# 带密码
curl "http://cachemgr:password@localhost:3128/squid-internal-mgr/info"
13.2.4 Cache Manager 页面列表
| 页面 | 说明 |
|---|
info | 综合信息 |
5min | 5 分钟统计 |
60min | 60 分钟统计 |
objects | 缓存对象 |
vm_objects | 内存对象 |
storedir | 缓存目录 |
mem | 内存使用 |
cbdata | 回调数据 |
events | 事件队列 |
peer_list | 对等体列表 |
peer_select | 对等体选择 |
client_list | 客户端列表 |
active_requests | 活跃请求 |
ipcache | IP 缓存 |
fqdncache | FQDN 缓存 |
idns | DNS 状态 |
delay | 延迟池 |
forward | 转发统计 |
redirector | 重写器状态 |
utilization | 利用率 |
config | 当前配置 |
shutdown | 关闭 Squid |
13.3 SNMP 监控
13.3.1 启用 SNMP
# SNMP 配置
snmp_port 3401
acl snmppublic snmp_community public
snmp_access allow snmppublic localhost
snmp_access allow snmppublic management
snmp_access deny all
# 测试 SNMP 查询
sudo apt install -y snmp
# 查询 Squid MIB
snmpwalk -v2c -c public localhost:3401 .1.3.6.1.4.3495
# 常用 OID
# .1.3.6.1.4.3495.1.1 — cacheProtoClientStats
# .1.3.6.1.4.3495.1.2 — cacheProtoServerStats
# .1.3.6.1.4.3495.1.3 — cacheProtoStats
13.4 Prometheus + Grafana
13.4.1 安装 Squid Exporter
# 下载 squid-exporter
wget https://github.com/boynux/squid-exporter/releases/latest/download/squid-exporter-linux-amd64
chmod +x squid-exporter-linux-amd64
sudo mv squid-exporter-linux-amd64 /usr/local/bin/squid-exporter
# 创建 systemd 服务
sudo tee /etc/systemd/system/squid-exporter.service <<'EOF'
[Unit]
Description=Squid Prometheus Exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/squid-exporter \
-squid-host localhost \
-squid-port 3128 \
-listen :9301
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl start squid-exporter
sudo systemctl enable squid-exporter
# 验证
curl http://localhost:9301/metrics
13.4.2 Prometheus 配置
# prometheus.yml
scrape_configs:
- job_name: 'squid'
static_configs:
- targets: ['localhost:9301']
scrape_interval: 15s
13.4.3 Grafana Dashboard
{
"dashboard": {
"title": "Squid Proxy Dashboard",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [{
"expr": "rate(squid_client_http_requests_total[5m])",
"legendFormat": "Requests/sec"
}]
},
{
"title": "Cache Hit Rate",
"type": "gauge",
"targets": [{
"expr": "rate(squid_client_http_hits_total[5m]) / rate(squid_client_http_requests_total[5m]) * 100",
"legendFormat": "Hit Rate %"
}]
},
{
"title": "Bandwidth",
"type": "graph",
"targets": [{
"expr": "rate(squid_client_http_kbytes_out_total[5m])",
"legendFormat": "KB/s"
}]
}
]
}
}
13.4.4 常用 PromQL 查询
# 请求速率
rate(squid_client_http_requests_total[5m])
# 缓存命中率
rate(squid_client_http_hits_total[5m]) / rate(squid_client_http_requests_total[5m]) * 100
# 带宽使用
rate(squid_client_http_kbytes_out_total[5m])
# 活跃连接
squid_client_http_clients
# 内存使用
squid_mem_alloc_bytes
13.5 ELK 集成
13.5.1 Filebeat 配置
# /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
paths:
- /var/log/squid/access.log
fields:
log_type: squid
multiline.pattern: '^\d{10}\.\d{3}'
multiline.negate: true
multiline.match: after
output.elasticsearch:
hosts: ["http://elasticsearch:9200"]
index: "squid-%{+yyyy.MM.dd}"
setup.template:
name: squid
pattern: squid-*
13.5.2 Logstash 过滤
# /etc/logstash/conf.d/squid.conf
filter {
if [fields][log_type] == "squid" {
grok {
match => {
"message" => "%{NUMBER:timestamp} %{NUMBER:duration} %{IP:client} %{WORD:squid_status}/%{NUMBER:http_status} %{NUMBER:bytes} %{WORD:method} %{URI:url} %{GREEDYDATA:extra}"
}
}
date {
match => ["timestamp", "UNIX"]
}
geoip {
source => "client"
}
}
}
13.6 自定义监控脚本
13.6.1 健康检查脚本
#!/bin/bash
# squid-health-check.sh
LOGFILE="/var/log/squid/access.log"
CACHELOG="/var/log/squid/cache.log"
# 检查进程
if ! pgrep squid > /dev/null; then
echo "CRITICAL: Squid is not running"
exit 2
fi
# 检查端口
if ! ss -tlnp | grep -q ":3128"; then
echo "CRITICAL: Squid port 3128 not listening"
exit 2
fi
# 检查响应
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" -x http://localhost:3128 http://example.com)
if [ "$RESPONSE" != "200" ]; then
echo "WARNING: Proxy returned $RESPONSE"
exit 1
fi
# 检查缓存命中率(最近 1000 条请求)
TOTAL=$(tail -1000 "$LOGFILE" | wc -l)
HITS=$(tail -1000 "$LOGFILE" | grep -c "TCP_HIT\|TCP_MEM_HIT")
if [ $TOTAL -gt 0 ]; then
HIT_RATIO=$((HITS * 100 / TOTAL))
if [ $HIT_RATIO -lt 30 ]; then
echo "WARNING: Cache hit ratio is $HIT_RATIO%"
exit 1
fi
fi
# 检查错误
ERRORS=$(tail -100 "$CACHELOG" | grep -c -i "error\|critical")
if [ $ERRORS -gt 10 ]; then
echo "WARNING: $ERRORS errors in recent cache.log"
exit 1
fi
echo "OK: Squid is healthy"
exit 0
13.6.2 带宽监控脚本
#!/bin/bash
# squid-bandwidth-monitor.sh
LOGFILE="/var/log/squid/access.log"
INTERVAL=60 # 秒
while true; do
# 获取当前时间戳
NOW=$(date +%s)
PAST=$((NOW - INTERVAL))
# 统计最近 60 秒的流量
BYTES=$(awk -v past="$PAST" '$1 > past {sum += $6} END {print sum}' "$LOGFILE")
BYTES=${BYTES:-0}
MBPS=$(echo "scale=2; $BYTES * 8 / $INTERVAL / 1000000" | bc)
echo "$(date): Bandwidth: ${MBPS} Mbps ($((BYTES / 1024)) KB)"
sleep $INTERVAL
done
13.7 告警配置
13.7.1 Prometheus Alertmanager
# alertmanager.yml
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'email-alerts'
receivers:
- name: 'email-alerts'
email_configs:
- to: '[email protected]'
from: '[email protected]'
smarthost: 'smtp.example.com:587'
# alert-rules.yml
groups:
- name: squid_alerts
rules:
- alert: SquidDown
expr: up{job="squid"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Squid is down"
- alert: HighErrorRate
expr: rate(squid_client_http_errors_total[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate detected"
- alert: LowCacheHitRate
expr: rate(squid_client_http_hits_total[5m]) / rate(squid_client_http_requests_total[5m]) * 100 < 30
for: 10m
labels:
severity: warning
annotations:
summary: "Cache hit rate below 30%"
13.8 监控指标总结
| 指标类别 | 具体指标 | 告警阈值 |
|---|
| 可用性 | 进程状态、端口监听 | 进程不存在 |
| 性能 | 请求速率、响应时间 | > 500ms |
| 缓存 | 命中率、缓存大小 | < 30% |
| 资源 | CPU、内存、文件描述符 | > 80% |
| 网络 | 带宽、连接数 | > 80% 容量 |
| 错误 | 5xx 错误率 | > 5% |
13.9 本章小结
| 监控方式 | 适用场景 | 复杂度 |
|---|
| Cache Manager | 快速诊断 | ★ |
| SNMP | 网管系统集成 | ★★ |
| Prometheus + Grafana | 企业级监控 | ★★★ |
| ELK Stack | 日志分析 | ★★★★ |
| 自定义脚本 | 特定需求 | ★★ |
扩展阅读