Python 编程教程 / 22 - 性能优化

第 22 章：性能优化

学习分析、测量和优化 Python 程序性能。

22.1 性能分析原则

“Premature optimization is the root of all evil.” — Donald Knuth

优化流程：

📊 测量（Profile）— 找到瓶颈
🎯 定位（Identify）— 确定热点代码
⚡ 优化（Optimize）— 针对性改进
✅ 验证（Verify）— 确认改进效果

22.2 代码剖析（Profiling）

22.2.1 cProfile（内置）

import cProfile
import pstats

def slow_function():
    total = 0
    for i in range(1_000_000):
        total += i ** 2
    return total

# 方式一：命令行
# $ python -m cProfile -s cumtime script.py

# 方式二：代码中
profiler = cProfile.Profile()
profiler.enable()
slow_function()
profiler.disable()

stats = pstats.Stats(profiler)
stats.sort_stats("cumulative")
stats.print_stats(10)  # 显示前 10 条

22.2.2 timeit（基准测试）

import timeit

# 比较不同实现
time1 = timeit.timeit("sum(range(1000))", number=10000)
time2 = timeit.timeit("import functools; functools.reduce(lambda a,b:a+b, range(1000))", number=10000)
print(f"sum(): {time1:.4f}s")
print(f"reduce(): {time2:.4f}s")

# 命令行
# $ python -m timeit "sum(range(1000))"

22.2.3 line_profiler（逐行分析）

$ pip install line_profiler
$ kernprof -l -v script.py

# script.py
@profile  # line_profiler 特殊标记
def process_data(data):
    result = []
    for item in data:
        result.append(item ** 2)
    return sorted(result)

22.2.4 memory_profiler（内存分析）

$ pip install memory_profiler
$ python -m memory_profiler script.py

@profile
def memory_heavy():
    big_list = [i for i in range(1_000_000)]
    filtered = [x for x in big_list if x % 2 == 0]
    return sum(filtered)

22.3 常见优化技巧

22.3.1 使用内置函数

# ❌ 慢
total = 0
for i in range(1_000_000):
    total += i

# ✅ 快（内置函数用 C 实现）
total = sum(range(1_000_000))

# 其他高效内置函数
min(data)       # 而非循环比较
max(data)
sorted(data)    # 而非手写排序
map(func, data) # 而非列表推导（大数据时更快）
any(data)
all(data)
len(data)       # O(1) 操作

22.3.2 选择合适的数据结构

# 查找操作：O(n) vs O(1)
items = list(range(100_000))
target = 99_999

# ❌ 慢：O(n)
target in items  # 遍历列表

# ✅ 快：O(1)
items_set = set(items)
target in items_set  # 哈希查找

# 字典 vs 列表查找
data_list = [{"id": i, "name": f"user_{i}"} for i in range(100_000)]
data_dict = {item["id"]: item for item in data_list}

# 查找 ID=99999
# 列表: O(n) — 需要遍历
# 字典: O(1) — 直接哈希查找

22.3.3 字符串拼接

# ❌ 慢：字符串是不可变的，每次拼接创建新对象
result = ""
for s in many_strings:
    result += s

# ✅ 快：使用 join
result = "".join(many_strings)

# ✅ 格式化使用 f-string（比 % 和 format 快）
name = "Alice"
f"Hello, {name}"  # 最快

22.3.4 列表推导 vs 循环

# ❌ 慢
result = []
for x in range(10_000):
    if x % 2 == 0:
        result.append(x ** 2)

# ✅ 快
result = [x**2 for x in range(10_000) if x % 2 == 0]

# ✅ 大数据用生成器（节省内存）
result = (x**2 for x in range(10_000) if x % 2 == 0)

22.3.5 缓存计算结果

from functools import lru_cache

# ✅ 自动缓存重复计算
@lru_cache(maxsize=128)
def fibonacci(n: int) -> int:
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

# 手动缓存
cache = {}
def expensive(n: int) -> int:
    if n not in cache:
        cache[n] = heavy_computation(n)
    return cache[n]

22.3.6 延迟导入

# ❌ 顶层导入（启动慢）
import pandas as pd
import numpy as np

# ✅ 延迟导入（用到才导入）
def analyze_data(filepath: str):
    import pandas as pd  # 只在需要时导入
    df = pd.read_csv(filepath)
    ...

22.4 NumPy 向量化

import numpy as np

# ❌ 慢：Python 循环
def dot_python(a, b):
    result = 0
    for i in range(len(a)):
        result += a[i] * b[i]
    return result

# ✅ 快：NumPy 向量化
a = np.random.rand(1_000_000)
b = np.random.rand(1_000_000)
result = np.dot(a, b)  # 比循环快 100 倍以上

# 条件选择
arr = np.random.rand(1_000_000)
# ❌ 循环
result = [x if x > 0.5 else 0 for x in arr]
# ✅ 向量化
result = np.where(arr > 0.5, arr, 0)

22.5 Cython

# fast_math.pyx
def sum_squares(int n):
    cdef int i
    cdef long total = 0
    for i in range(n):
        total += i * i
    return total

$ pip install cython
$ cythonize -i fast_math.pyx

import fast_math
print(fast_math.sum_squares(1_000_000))  # 比纯 Python 快 10-50 倍

22.6 PyPy

# 安装 PyPy
$ brew install pypy3  # macOS
$ sudo apt install pypy3  # Ubuntu

# 运行（通常快 5-10 倍）
$ pypy3 script.py

特性	CPython	PyPy
JIT 编译	❌	✅
兼容性	完全	大部分（不支持 C 扩展）
启动时间	快	稍慢
长运行性能	中	高

22.7 并发优化

# I/O 密集型：使用 asyncio 或 ThreadPoolExecutor
import asyncio
import httpx

async def fetch_all(urls: list[str]) -> list[str]:
    async with httpx.AsyncClient() as client:
        tasks = [client.get(url) for url in urls]
        responses = await asyncio.gather(*tasks)
        return [r.text for r in responses]

# CPU 密集型：使用 ProcessPoolExecutor
from concurrent.futures import ProcessPoolExecutor

def heavy_compute(n: int) -> int:
    return sum(i**2 for i in range(n))

with ProcessPoolExecutor() as pool:
    results = list(pool.map(heavy_compute, [10**6] * 8))

22.8 内存优化

# __slots__ 减少实例内存
class Point:
    __slots__ = ("x", "y")
    def __init__(self, x, y):
        self.x = x
        self.y = y

# 生成器替代列表
# ❌ 大列表
big = [i for i in range(10_000_000)]
# ✅ 生成器
big = (i for i in range(10_000_000))

# sys.getsizeof 查看对象大小
import sys
sys.getsizeof([])         # 56 字节
sys.getsizeof([1,2,3])    # 88 字节
sys.getsizeof(tuple())    # 40 字节（比列表小）

22.9 性能优化检查清单

优化方向	技术	适用场景
算法	降低复杂度	通用
数据结构	用 dict/set 替代 list 查找	查找密集
内置函数	使用 sum/min/max/any/all	通用
字符串	join 替代 +=	字符串拼接
缓存	lru_cache	重复计算
向量化	NumPy	数值计算
JIT	PyPy	CPU 密集
C 扩展	Cython/cffi	极致性能
并发	asyncio/多进程	I/O/CPU 密集

22.10 注意事项

🔴 注意：

不要过早优化，先让它工作，再让它快
不要猜测瓶颈，用 profiling 工具测量
C 扩展和 Cython 会降低可移植性
PyPy 不兼容所有 C 扩展库

💡 提示：

内置函数通常比手写循环快
使用 set 和 dict 进行 O(1) 查找
大数据使用生成器避免内存溢出
I/O 密集用 asyncio，CPU 密集用多进程

📌 业务场景：

# 优化日志分析：从 60 秒 → 2 秒
import re
from collections import Counter

# 优化前：逐行 Python 循环
def parse_log_slow(filepath: str) -> dict:
    counts = {}
    with open(filepath) as f:
        for line in f:
            match = re.search(r"level=(\w+)", line)
            if match:
                level = match.group(1)
                counts[level] = counts.get(level, 0) + 1
    return counts

# 优化后：Counter + 批量处理
def parse_log_fast(filepath: str) -> dict:
    pattern = re.compile(r"level=(\w+)")  # 预编译正则
    with open(filepath) as f:
        levels = [m.group(1) for line in f if (m := pattern.search(line))]
    return dict(Counter(levels))

22.11 扩展阅读

Python Speed
cProfile 文档
Cython 文档
PyPy 文档
《High Performance Python》by Micha Gorelick