RTMP 协议精讲 / 12 - 最佳实践

最佳实践

12.1 低延迟优化

低延迟是直播的核心竞争力之一。RTMP 推流端到观众端的典型延迟为 1-3 秒，通过系统优化可以达到 1 秒以内。

12.1.1 延迟分析

延迟组成（典型 RTMP 直播）：

┌──────────┬──────────┬──────────┬──────────┬──────────┐
│  编码    │  推流    │  服务器  │  分发    │  播放    │
│  延迟    │  延迟    │  处理    │  延迟    │  缓冲    │
│ 100-500ms│ 100-300ms│ 50-200ms │ 100-500ms│ 500-2000ms│
└──────────┴──────────┴──────────┴──────────┴──────────┘

总延迟: 850ms - 3.5s

12.1.2 编码器优化

# FFmpeg 低延迟推流配置
ffmpeg -re -i input.mp4 \
    # 视频编码
    -c:v libx264 \
    -preset ultrafast \      # 最快编码
    -tune zerolatency \      # 零延迟调优
    -bf 0 \                  # 关闭 B 帧
    -g 30 \                  # 1 秒 GOP (30fps)
    -keyint_min 30 \         # 最小 GOP
    -sc_threshold 0 \        # 禁用场景切换检测
    -b:v 2000k \
    -maxrate 2000k \
    -bufsize 1000k \         # 小缓冲区
    
    # 音频编码
    -c:a aac \
    -b:a 128k \
    -ar 44100 \
    
    # 输出
    -f flv \
    -rtmp_buffer 0 \         # 无缓冲
    -rtmp_live live \
    rtmp://localhost:1935/live/stream

参数	低延迟值	标准值	说明
preset	ultrafast	veryfast	编码速度
tune	zerolatency	—	零延迟模式
bf (B帧)	0	2	关闭 B 帧
g (GOP)	30 (1s)	60 (2s)	关键帧间隔
bufsize	1000k	4000k	编码缓冲区
sc_threshold	0	—	禁用场景切换

12.1.3 服务器优化

# SRS 低延迟配置
vhost __defaultVhost__ {
    # 最小延迟模式
    min_latency on;

    # 绝对时间戳（避免时间戳跳变）
    atc on;

    # GOP Cache
    gop_cache on;
    gop_cache_max_frames 2500;

    # 消息队列
    queue {
        enabled on;
        capacity 2500;
        jitter_algorithm low_latency;
    }

    # 低延迟 HLS
    hls {
        enabled on;
        hls_fragment 1;        # 1 秒分片
        hls_window 3;          # 3 秒窗口
        hls_td_ratio 1.0;      # 目标延迟
    }

    # TCP 优化
    tcp_nodelay on;
    send_min_interval 10;
}

12.1.4 播放器优化

// flv.js 低延迟播放配置
const player = flvjs.createPlayer({
    type: 'flv',
    isLive: true,
    url: 'http://localhost:8080/live/stream.flv',
}, {
    enableWorker: false,
    enableStashBuffer: false,    // 禁用缓冲
    stashInitialSize: 128,       // 最小初始缓冲
    lazyLoad: false,
    lazyLoadMaxDuration: 0,
    deferLoadAfterSourceOpen: false,
    autoCleanupSourceBuffer: true,
    autoCleanupMaxBackwardDuration: 3,
    autoCleanupMinBackwardDuration: 1,
});

// hls.js 低延迟配置
const hls = new Hls({
    liveSyncDurationCount: 2,     // 2 个分片同步
    liveMaxLatencyDurationCount: 4,
    lowLatencyMode: true,
    backBufferLength: 0,
});

12.1.5 延迟监控

#!/usr/bin/env python3
"""
延迟测量工具
通过比较推流端和播放端的时间戳测量延迟
"""

import subprocess
import time
import json


def measure_rtmp_latency(stream_url: str) -> float:
    """
    测量 RTMP 流延迟（秒）
    通过获取流信息中的时间戳差值
    """
    result = subprocess.run(
        ['ffprobe', '-v', 'quiet', '-print_format', 'json',
         '-show_streams', stream_url],
        capture_output=True, text=True
    )
    info = json.loads(result.stdout)
    
    for stream in info.get('streams', []):
        if stream.get('codec_type') == 'video':
            return float(stream.get('r_frame_rate', '0/1').split('/')[0])
    return 0


def calculate_end_to_end_delay(push_time: float, play_time: float) -> float:
    """计算端到端延迟"""
    return play_time - push_time

12.2 安全加固

12.2.1 推流鉴权

Token 鉴权方案：

#!/usr/bin/env python3
"""
RTMP Token 鉴权服务
支持推流和播放鉴权
"""

import hashlib
import time
import hmac
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.parse import urlparse, parse_qs
import json


# 密钥配置
SECRET_KEY = "your-secret-key-here"

# Token 有效期（秒）
TOKEN_TTL = 86400  # 24 小时


def generate_token(stream_key: str, user_id: str, expires: int = None) -> str:
    """
    生成推流 Token
    
    Token 格式: {expires}-{hmac}
    """
    if expires is None:
        expires = int(time.time()) + TOKEN_TTL
    
    message = f"{stream_key}:{user_id}:{expires}"
    signature = hmac.new(
        SECRET_KEY.encode(),
        message.encode(),
        hashlib.sha256
    ).hexdigest()
    
    return f"{expires}-{signature}"


def verify_token(stream_key: str, user_id: str, token: str) -> bool:
    """
    验证 Token
    """
    try:
        parts = token.split('-', 1)
        if len(parts) != 2:
            return False
        
        expires = int(parts[0])
        signature = parts[1]
        
        # 检查过期
        if time.time() > expires:
            return False
        
        # 验证签名
        message = f"{stream_key}:{user_id}:{expires}"
        expected = hmac.new(
            SECRET_KEY.encode(),
            message.encode(),
            hashlib.sha256
        ).hexdigest()
        
        return hmac.compare_digest(signature, expected)
    except Exception:
        return False


class AuthHandler(BaseHTTPRequestHandler):
    """鉴权 HTTP 处理器"""
    
    def do_GET(self):
        parsed = urlparse(self.path)
        params = parse_qs(parsed.query)
        
        if parsed.path == '/auth/publish':
            # 推流鉴权
            app = params.get('app', [''])[0]
            stream = params.get('stream', [''])[0]
            token = params.get('token', [''])[0]
            user_id = params.get('uid', [''])[0]
            
            if verify_token(stream, user_id, token):
                self._respond(200, {'status': 'ok'})
            else:
                self._respond(403, {'status': 'denied', 'reason': 'Invalid token'})
        
        elif parsed.path == '/auth/play':
            # 播放鉴权
            app = params.get('app', [''])[0]
            stream = params.get('stream', [''])[0]
            
            # 可以添加播放端鉴权逻辑
            self._respond(200, {'status': 'ok'})
        
        else:
            self._respond(404, {'status': 'not found'})
    
    def _respond(self, code: int, data: dict):
        self.send_response(code)
        self.send_header('Content-Type', 'application/json')
        self.end_headers()
        self.wfile.write(json.dumps(data).encode())
    
    def log_message(self, format, *args):
        pass


if __name__ == '__main__':
    # 测试 Token 生成
    token = generate_token('mystream', 'user123')
    print(f"Generated Token: {token}")
    print(f"Verification: {verify_token('mystream', 'user123', token)}")
    
    # 启动鉴权服务
    server = HTTPServer(('0.0.0.0', 8081), AuthHandler)
    print("Auth server running on http://0.0.0.0:8081")
    server.serve_forever()

12.2.2 SRS 鉴权配置

vhost __defaultVhost__ {
    # 推流鉴权
    http_hooks {
        enabled on;
        on_publish http://localhost:8081/auth/publish;
        on_unpublish http://localhost:8081/auth/publish;
        on_play http://localhost:8081/auth/play;
        on_stop http://localhost:8081/auth/play;
    }
}

12.2.3 IP 白名单

vhost __defaultVhost__ {
    # 只允许特定 IP 推流
    allow publish 192.168.1.0/24;
    allow publish 10.0.0.0/8;
    deny publish all;
    
    # 允许所有 IP 播放
    allow play all;
}

12.2.4 TLS 加密 (RTMPS)

# SRS TLS 配置
listen 1935;
listen 443 ssl;

ssl_certificate /etc/ssl/certs/server.crt;
ssl_certificate_key /etc/ssl/private/server.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;

推流命令：

# RTMPS 推流
ffmpeg -re -i input.mp4 -c copy -f flv \
    rtmps://your-server:443/live/stream?token=xxx

12.2.5 访问控制清单

安全检查清单：

[ ] 推流鉴权已启用
[ ] Token 有时效性
[ ] IP 白名单已配置
[ ] TLS/SSL 已启用
[ ] 防火墙规则已配置
[ ] 敏感端口已关闭
[ ] 日志审计已开启
[ ] 定期更新密码/密钥
[ ] 限制最大连接数
[ ] 监控异常推流

12.3 监控告警

12.3.1 SRS HTTP API 监控

#!/usr/bin/env python3
"""
SRS 监控脚本
定期检查 SRS 服务状态并发送告警
"""

import requests
import time
import json
from datetime import datetime


SRS_API = "http://localhost:1985"
CHECK_INTERVAL = 30  # 秒

# 告警阈值
THRESHOLDS = {
    'cpu_usage': 80,           # CPU 使用率 %
    'memory_usage': 80,        # 内存使用率 %
    'connection_count': 800,   # 连接数
    'stream_count': 100,       # 流数量
    'send_kbps': 1000000,      # 发送带宽 kbps
}


def get_srs_stats() -> dict:
    """获取 SRS 统计数据"""
    try:
        resp = requests.get(f"{SRS_API}/api/v1/summaries", timeout=5)
        return resp.json()
    except Exception as e:
        print(f"[ERROR] 获取 SRS 统计失败: {e}")
        return None


def check_health(stats: dict) -> list:
    """检查健康状态，返回告警列表"""
    alerts = []
    
    if not stats:
        return [{"level": "CRITICAL", "message": "SRS 服务不可达"}]
    
    # 检查连接数
    conn_count = stats.get('summaries', {}).get('connections', 0)
    if conn_count > THRESHOLDS['connection_count']:
        alerts.append({
            "level": "WARNING",
            "message": f"连接数过高: {conn_count}"
        })
    
    # 检查流数量
    stream_count = stats.get('summaries', {}).get('streams', 0)
    if stream_count > THRESHOLDS['stream_count']:
        alerts.append({
            "level": "WARNING",
            "message": f"流数量过多: {stream_count}"
        })
    
    # 检查发送带宽
    send_kbps = stats.get('summaries', {}).get('send_kbps', 0)
    if send_kbps > THRESHOLDS['send_kbps']:
        alerts.append({
            "level": "WARNING",
            "message": f"发送带宽过高: {send_kbps} kbps"
        })
    
    return alerts


def send_alert(alert: dict):
    """发送告警"""
    timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    level = alert['level']
    message = alert['message']
    
    print(f"[{timestamp}] [{level}] {message}")
    
    # 可以集成各种告警渠道：
    # - 邮件
    # - 钉钉/飞书/企微
    # - Slack
    # - PagerDuty
    # - Telegram


def monitor_loop():
    """监控主循环"""
    print("🔍 SRS 监控启动...")
    
    while True:
        try:
            stats = get_srs_stats()
            alerts = check_health(stats)
            
            for alert in alerts:
                send_alert(alert)
            
            if not alerts:
                print(f"[{datetime.now().strftime('%H:%M:%S')}] ✅ 服务正常")
        
        except Exception as e:
            print(f"[ERROR] 监控异常: {e}")
        
        time.sleep(CHECK_INTERVAL)


if __name__ == '__main__':
    monitor_loop()

12.3.2 Grafana Dashboard

{
  "dashboard": {
    "title": "SRS RTMP Monitor",
    "panels": [
      {
        "title": "连接数",
        "type": "graph",
        "targets": [{"expr": "srs_connections"}]
      },
      {
        "title": "流数量",
        "type": "stat",
        "targets": [{"expr": "srs_streams"}]
      },
      {
        "title": "发送带宽 (Mbps)",
        "type": "graph",
        "targets": [{"expr": "srs_send_kbps / 1000"}]
      },
      {
        "title": "接收带宽 (Mbps)",
        "type": "graph",
        "targets": [{"expr": "srs_recv_kbps / 1000"}]
      }
    ]
  }
}

12.3.3 告警规则

# prometheus/alerts.yml
groups:
  - name: srs_alerts
    rules:
      - alert: SRSDown
        expr: up{job="srs"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "SRS 服务宕机"
          description: "SRS 服务已停止响应超过 1 分钟"

      - alert: HighConnectionCount
        expr: srs_connections > 800
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "连接数过高"
          description: "当前连接数: {{ $value }}"

      - alert: HighBandwidth
        expr: srs_send_kbps > 1000000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "带宽使用过高"
          description: "当前发送带宽: {{ $value }} kbps"

12.4 生产部署检查清单

12.4.1 部署前检查

□ 服务器配置
  □ CPU: 8 核以上
  □ 内存: 16GB 以上
  □ 带宽: 100Mbps 以上（根据预期观众数）
  □ 磁盘: SSD，剩余空间充足
  □ 系统: Ubuntu 22.04 / CentOS 8+

□ 网络配置
  □ 防火墙端口开放 (1935, 80, 443)
  □ 安全组配置正确
  □ 带宽测试通过
  □ DNS 配置正确

□ SRS 配置
  □ 最大连接数合理
  □ HLS 配置已优化
  □ 录制路径已配置
  □ 日志级别已设置
  □ HTTP API 已启用

□ 安全配置
  □ 推流鉴权已启用
  □ TLS 证书已配置
  □ IP 白名单已设置
  □ 密码已修改
  □ 不必要端口已关闭

12.4.2 运行时监控

□ 监控指标
  □ 服务器 CPU/内存/磁盘
  □ SRS 连接数
  □ 流数量
  □ 推流/播放带宽
  □ 错误率

□ 告警规则
  □ 服务宕机告警
  □ 连接数阈值告警
  □ 带宽阈值告警
  □ 磁盘空间告警
  □ 录制异常告警

12.4.3 故障处理预案

常见故障处理：

1. 推流失败
   → 检查推流地址和密钥
   → 检查网络连通性
   → 检查鉴权服务状态
   → 查看 SRS 日志

2. 播放卡顿
   → 检查服务器负载
   → 检查带宽使用率
   → 检查 GOP Cache 配置
   → 检查 HLS 分片配置

3. 延迟过高
   → 检查编码器设置（B帧、GOP）
   → 检查服务器缓冲配置
   → 检查播放器缓冲设置
   → 启用低延迟模式

4. 服务宕机
   → 自动重启脚本
   → 故障转移（主备切换）
   → 告警通知
   → 日志分析

12.5 自动化运维脚本

12.5.1 健康检查脚本

#!/bin/bash
# healthcheck.sh - SRS 健康检查

SRS_API="http://localhost:1985"
ALERT_WEBHOOK="https://hooks.example.com/alert"

check_srs() {
    # 检查进程
    if ! pgrep -f "srs" > /dev/null; then
        echo "CRITICAL: SRS 进程不存在"
        send_alert "SRS 进程已停止"
        return 1
    fi
    
    # 检查 API
    if ! curl -s "$SRS_API/api/v1/summaries" > /dev/null; then
        echo "CRITICAL: SRS API 不响应"
        send_alert "SRS API 不响应"
        return 1
    fi
    
    # 检查端口
    for port in 1935 1985 8080; do
        if ! nc -z localhost $port 2>/dev/null; then
            echo "WARNING: 端口 $port 未监听"
            send_alert "SRS 端口 $port 未监听"
        fi
    done
    
    echo "OK: SRS 服务正常"
    return 0
}

send_alert() {
    local message="$1"
    local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    
    # 发送到 Webhook
    curl -s -X POST "$ALERT_WEBHOOK" \
        -H "Content-Type: application/json" \
        -d "{\"text\": \"[$timestamp] SRS 告警: $message\"}"
}

# 执行检查
check_srs
exit $?

12.5.2 自动重启脚本

#!/bin/bash
# autorrestart.sh - SRS 自动重启

SRS_CMD="./objs/srs -c conf/srs.conf"
LOG_FILE="/var/log/srs-restart.log"
MAX_RESTARTS=10
RESTART_INTERVAL=60

restart_count=0

while true; do
    # 检查 SRS 是否运行
    if ! pgrep -f "srs" > /dev/null; then
        restart_count=$((restart_count + 1))
        timestamp=$(date '+%Y-%m-%d %H:%M:%S')
        
        if [ $restart_count -gt $MAX_RESTARTS ]; then
            echo "[$timestamp] ERROR: 超过最大重启次数，停止重启" >> "$LOG_FILE"
            exit 1
        fi
        
        echo "[$timestamp] WARNING: SRS 已停止，正在重启 (第 $restart_count 次)" >> "$LOG_FILE"
        
        # 启动 SRS
        $SRS_CMD &
        
        sleep 5
        
        # 验证启动
        if pgrep -f "srs" > /dev/null; then
            echo "[$timestamp] INFO: SRS 重启成功" >> "$LOG_FILE"
        else
            echo "[$timestamp] ERROR: SRS 重启失败" >> "$LOG_FILE"
        fi
    fi
    
    sleep $RESTART_INTERVAL
done

12.5.3 日志清理脚本

#!/bin/bash
# cleanup.sh - 定期清理旧文件

HLS_DIR="/data/hls"
RECORD_DIR="/data/recordings"
LOG_DIR="/var/log/srs"

# 保留天数
KEEP_DAYS=7

echo "开始清理 ${KEEP_DAYS} 天前的文件..."

# 清理 HLS 分片
find "$HLS_DIR" -name "*.ts" -mtime +$KEEP_DAYS -delete
find "$HLS_DIR" -name "*.m3u8" -mtime +$KEEP_DAYS -delete

# 清理录制文件（可选：归档而非删除）
find "$RECORD_DIR" -name "*.flv" -mtime +$KEEP_DAYS -delete

# 清理日志
find "$LOG_DIR" -name "*.log" -mtime +$KEEP_DAYS -delete

echo "清理完成"

12.6 性能调优参考

12.6.1 系统参数调优

# /etc/sysctl.conf
# 最大文件描述符
fs.file-max = 1000000

# TCP 优化
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_max_tw_buckets = 5000

# 应用配置
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_syncookies = 1

# 应用
sysctl -p

12.6.2 ulimit 配置

# /etc/security/limits.conf
* soft nofile 1000000
* hard nofile 1000000
* soft nproc 65535
* hard nproc 65535

12.6.3 SRS 性能参数

参数	推荐值	说明
max_connections	1000-10000	根据服务器配置
chunk_size	60000	大块减少开销
gop_cache	on	关键帧缓存
queue.capacity	2500	消息队列容量
min_latency	on	最小延迟模式
tcp_nodelay	on	禁用 Nagle

12.7 常见问题排查

12.7.1 问题排查流程

问题发生
    │
    ├── 推流失败
    │   ├── 检查网络连通性 (ping/telnet)
    │   ├── 检查鉴权配置
    │   ├── 检查推流地址格式
    │   └── 查看 SRS 日志
    │
    ├── 播放卡顿
    │   ├── 检查带宽使用率
    │   ├── 检查关键帧间隔
    │   ├── 检查 GOP Cache
    │   └── 检查播放器缓冲
    │
    ├── 延迟过高
    │   ├── 检查 B 帧配置
    │   ├── 检查 GOP 大小
    │   ├── 检查服务器缓冲
    │   └── 检查 HLS 分片大小
    │
    └── 服务异常
        ├── 检查进程状态
        ├── 检查系统资源
        ├── 检查日志错误
        └── 检查网络连接

12.7.2 常见问题速查表

问题	可能原因	解决方案
推流连接被拒	鉴权失败	检查 Token/密钥
播放黑屏	无关键帧	检查编码器 GOP 设置
音视频不同步	时间戳异常	启用 atc 模式
HLS 延迟高	分片太大	减小 hls_fragment
服务器 CPU 高	转码负载	减少转码任务
连接数不增长	达到上限	增加 max_connections

注意事项

监控先行：部署前先建立监控体系，有问题才能及时发现
渐进式优化：一次只改一个参数，观察效果后再继续
备份配置：每次修改配置前先备份
压测验证：重要变更上线前进行压力测试
文档记录：所有配置变更和故障处理过程都要记录

扩展阅读

上一章：11 - Docker 部署
全教程完结 🎉
感谢阅读 RTMP 协议精讲教程。如有问题，欢迎在 GitHub Issues 中讨论。