curl 深度教程 / 第 08 章：下载与传输管理

第 08 章：下载与传输管理

curl 不仅是 API 测试工具，它还是一个强大的下载管理器。本章介绍如何用 curl 高效地下载文件，包括断点续传、限速控制和并行下载。

8.1 基本下载

输出控制

# 输出到 stdout（默认行为）
curl https://example.com/file.txt

# 使用 -o 指定本地文件名
curl -o report.pdf https://example.com/report.pdf

# 使用 -O 使用远程文件名
curl -O https://example.com/documents/report.pdf
# 保存为 report.pdf

# 下载到指定目录
curl -o /tmp/downloads/report.pdf https://example.com/report.pdf

# 下载多个文件
curl -O https://example.com/file1.txt \
     -O https://example.com/file2.txt \
     -O https://example.com/file3.txt

# 下载多个文件到指定目录
for f in file1.txt file2.txt file3.txt; do
  curl -o "/tmp/$f" "https://example.com/$f"
done

# 静默下载（无进度条、无错误信息）
curl -sO https://example.com/file.bin

# 静默下载但显示错误
curl -sSO https://example.com/file.bin

下载与命名

# 使用远程文件名（-O），自动截取 URL 最后部分
curl -O https://cdn.example.com/v2/assets/style.css
# 保存为 style.css

# 使用 Content-Disposition 头中的文件名
# （curl 原生不支持，需要脚本辅助）
FILENAME=$(curl -sI https://example.com/download \
  | grep -i content-disposition \
  | sed -n 's/.*filename="\(.*\)"/\1/p')
curl -o "$FILENAME" https://example.com/download

# 为下载添加时间戳
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
curl -o "backup_${TIMESTAMP}.tar.gz" https://example.com/backup.tar.gz

# 批量下载并保留目录结构
while read -r url; do
  filepath="${url#https://example.com/}"
  mkdir -p "$(dirname "$filepath")"
  curl -o "$filepath" "$url"
done < urls.txt

8.2 断点续传

当下载中断时，可以从中断处继续下载，而不需要重新开始。

使用 -C - 自动续传

# 自动从中断处继续（curl 自动检测已下载大小）
curl -C - -O https://example.com/largefile.iso

# 手动指定续传位置
curl -C 1048576 -O https://example.com/largefile.iso
# 从 1MB 处继续下载

# 完整的下载脚本（带重试和续传）
MAX_RETRIES=5
RETRY_DELAY=5
URL="https://example.com/largefile.iso"

for i in $(seq 1 $MAX_RETRIES); do
  echo "尝试 $i/$MAX_RETRIES..."
  if curl -C - -O "$URL"; then
    echo "下载成功！"
    exit 0
  fi
  echo "下载失败，${RETRY_DELAY}秒后重试..."
  sleep $RETRY_DELAY
done
echo "下载失败，已达最大重试次数"
exit 1

断点续传的工作原理

客户端                              服务器
  |                                    |
  |--- HEAD /file.bin --------------->|
  |<-- 200, Content-Length: 1GB -----|
  |                                    |
  |--- GET /file.bin ---------------->|
  |<-- 200, [数据...] --------------|
  |    （下载了 100MB 后中断）         |
  |                                    |
  |--- GET /file.bin ---------------->|
  |    Range: bytes=104857600-        |
  |<-- 206 Partial Content ----------|
  |    Content-Range: bytes 104857600-|
  |    Content-Length: 剩余大小        |
  |    [从 100MB 处继续传输...]        |

⚠️ 注意：断点续传需要服务器支持 Range 请求头。如果服务器不支持，curl 会从头开始下载。

8.3 限速

下载限速

# 限制下载速度为 1MB/s
curl --limit-rate 1m -O https://example.com/largefile.iso

# 限制为 500KB/s
curl --limit-rate 500k -O https://example.com/largefile.iso

# 限制为 100 字节/秒（用于测试）
curl --limit-rate 100 -O https://example.com/smallfile.txt

# 单位后缀
# B = 字节（默认，可省略）
# K = KB (1024)
# M = MB (1024*1024)
# G = GB (1024*1024*1024)

限速的实际应用场景

# 场景 1：后台下载不影响正常网络使用
curl --limit-rate 2m -C - -O https://updates.example.com/update.tar.gz &

# 场景 2：测试慢网络环境下的应用表现
curl --limit-rate 50k http://localhost:3000/api/data -o /dev/null

# 场景 3：批量下载时共享带宽
for url in $(cat download_urls.txt); do
  curl --limit-rate 500k -C - -O "$url"
done

# 场景 4：FTP 限速下载
curl --limit-rate 1m ftp://ftp.example.com/releases/package.tar.gz -O

8.4 进度条与统计

进度条控制

# 默认进度条（输出到终端时自动显示）
curl -O https://example.com/file.bin

# 简化进度条（# 号）
curl -# -O https://example.com/file.bin

# 静默（无进度条）
curl -s -O https://example.com/file.bin

# 强制显示进度条（即使输出被重定向）
curl -O https://example.com/file.bin --progress-bar

使用 -w 获取传输统计

# 显示详细的传输统计
curl -o /dev/null -s -w "\
HTTP 状态码:     %{http_code}\n\
下载大小:        %{size_download} bytes\n\
上传大小:        %{size_upload} bytes\n\
平均下载速度:    %{speed_download} bytes/sec\n\
平均上传速度:    %{speed_upload} bytes/sec\n\
DNS 解析时间:    %{time_namelookup}s\n\
TCP 连接时间:    %{time_connect}s\n\
TLS 握手时间:    %{time_appconnect}s\n\
首字节时间:      %{time_starttransfer}s\n\
总耗时:          %{time_total}s\n\
" https://example.com

# 输出为 JSON 格式
curl -o /dev/null -s -w '{
  "http_code": %{http_code},
  "size_download": %{size_download},
  "speed_download": %{speed_download},
  "time_namelookup": %{time_namelookup},
  "time_connect": %{time_connect},
  "time_appconnect": %{time_appconnect},
  "time_starttransfer": %{time_starttransfer},
  "time_total": %{time_total}
}\n' https://example.com | jq .

-w 格式变量完整列表

变量	说明	单位
`%{http_code}`	HTTP 状态码	数字
`%{size_download}`	下载字节数	bytes
`%{size_upload}`	上传字节数	bytes
`%{speed_download}`	平均下载速度	bytes/sec
`%{speed_upload}`	平均上传速度	bytes/sec
`%{time_namelookup}`	DNS 解析时间	秒
`%{time_connect}`	TCP 连接建立时间	秒
`%{time_appconnect}`	TLS/SSL 握手时间	秒
`%{time_pretransfer}`	传输准备时间	秒
`%{time_starttransfer}`	首字节时间 (TTFB)	秒
`%{time_total}`	总耗时	秒
`%{url_effective}`	最终 URL	字符串
`%{redirect_url}`	重定向 URL	字符串
`%{num_redirects}`	重定向次数	数字
`%{ssl_verify_result}`	SSL 验证结果	数字
`%{content_type}`	Content-Type	字符串
`%{size_header}`	响应头大小	bytes
`%{request_size}`	请求大小	bytes
`%{local_ip}`	本地 IP	字符串
`%{local_port}`	本地端口	数字
`%{remote_ip}`	远程 IP	字符串
`%{remote_port}`	远程端口	数字

8.5 并行下载

curl 本身不支持并行下载，但可以结合 shell 工具实现。

使用 xargs 并行下载

# 并行下载多个文件（4 个并发）
cat urls.txt | xargs -n 1 -P 4 curl -sO

# 并行下载并限制速度（共享带宽）
cat urls.txt | xargs -n 1 -P 4 curl --limit-rate 500k -sO

# 并行下载到指定目录
cat urls.txt | xargs -n 1 -P 4 -I {} curl -sO {}

使用 GNU Parallel

# 使用 GNU parallel 并行下载
cat urls.txt | parallel -j 4 curl -sO {}

# 带进度显示
cat urls.txt | parallel -j 4 --progress curl -sO {}

# 重试失败的下载
cat urls.txt | parallel -j 4 --retries 3 curl -sO {}

# 限制每个下载的速度
cat urls.txt | parallel -j 4 curl --limit-rate 250k -sO {}

分片并行下载

#!/bin/bash
# 分片并行下载大文件
URL="https://example.com/largefile.iso"
OUTPUT="largefile.iso"
CHUNKS=4

# 获取文件大小
FILE_SIZE=$(curl -sI "$URL" | grep -i content-length | awk '{print $2}' | tr -d '\r')
CHUNK_SIZE=$((FILE_SIZE / CHUNKS))

# 并行下载各分片
for i in $(seq 0 $((CHUNKS - 1))); do
  START=$((i * CHUNK_SIZE))
  if [ "$i" -eq $((CHUNKS - 1)) ]; then
    END=$((FILE_SIZE - 1))
  else
    END=$(((i + 1) * CHUNK_SIZE - 1))
  fi
  
  curl -s -r "${START}-${END}" -o "${OUTPUT}.part${i}" "$URL" &
done

# 等待所有分片下载完成
wait

# 合并分片
for i in $(seq 0 $((CHUNKS - 1))); do
  cat "${OUTPUT}.part${i}" >> "$OUTPUT"
  rm "${OUTPUT}.part${i}"
done

echo "下载完成: $OUTPUT ($(stat -c%s "$OUTPUT") bytes)"

8.6 下载后处理

自动解压

# 下载并解压 tar.gz
curl -sL https://example.com/archive.tar.gz | tar xzf -

# 下载并解压到指定目录
curl -sL https://example.com/archive.tar.gz | tar xzf - -C /opt/app/

# 下载并解压 zip（需要 unzip）
curl -sLO https://example.com/archive.zip && unzip archive.zip && rm archive.zip

# 下载 gzip 压缩的文本并解压
curl -sL --compressed https://example.com/data.gz | zcat

# 下载并执行（危险！仅用于可信源）
curl -sL https://install.example.com/setup.sh | bash

下载并验证

# 下载文件并验证 SHA256
curl -LO https://example.com/package.tar.gz
EXPECTED="e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
ACTUAL=$(sha256sum package.tar.gz | awk '{print $1}')
if [ "$ACTUAL" = "$EXPECTED" ]; then
  echo "✅ 校验通过"
else
  echo "❌ 校验失败"
  rm package.tar.gz
  exit 1
fi

# 使用签名文件验证
curl -LO https://example.com/package.tar.gz
curl -LO https://example.com/package.tar.gz.asc
gpg --verify package.tar.gz.asc package.tar.gz

8.7 递归下载（结合 wget）

curl 不直接支持递归下载，但可以与 wget 配合或使用脚本实现。

# 使用 wget 进行递归下载
wget --mirror --convert-links --no-parent \
  https://example.com/docs/

# 使用 curl 递归下载（简单脚本）
# 1. 获取所有链接
URLS=$(curl -s https://example.com/docs/ \
  | grep -oP 'href="\K[^"]+' \
  | grep -E '\.(html|css|js|png)$')

# 2. 逐个下载
for url in $URLS; do
  curl -sO "https://example.com/docs/$url"
done

# 使用 curl + lynx 获取链接
lynx -dump -listonly https://example.com/docs/ \
  | awk '{print $2}' \
  | grep 'example.com' \
  | xargs -n 1 -P 4 curl -sO

注意事项

-O 需要文件名：URL 必须以文件名结尾，否则使用 -o 手动指定
断点续传需要服务器支持：服务器需返回 Accept-Ranges: bytes
限速是峰值限制：实际速度可能略低于设定值
并行下载需注意连接数：过多并发可能触发服务器限流
下载后验证：重要文件务必校验 checksum

# 检查服务器是否支持 Range 请求
curl -sI https://example.com/largefile.zip | grep -i "accept-ranges"
# 输出：Accept-Ranges: bytes → 支持断点续传

# 检查文件是否已完整下载
LOCAL_SIZE=$(stat -c%s downloaded_file.iso 2>/dev/null || echo 0)
REMOTE_SIZE=$(curl -sI https://example.com/file.iso \
  | grep -i content-length | awk '{print $2}' | tr -d '\r')
if [ "$LOCAL_SIZE" = "$REMOTE_SIZE" ]; then
  echo "文件已完整下载"
fi

扩展阅读

📖 下一章：第 09 章：传输选项与网络调优 — 深入了解超时设置、重试策略、代理配置和连接池管理。