强曰为道
与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

Python 编程教程 / 07 - 数据结构

第 07 章:数据结构

掌握 Python 内置数据结构及其高级用法,选择合适的数据结构解决实际问题。


7.1 列表(list)

7.1.1 基本操作

# 创建
fruits = ["apple", "banana", "cherry"]
numbers = list(range(5))  # [0, 1, 2, 3, 4]
empty = []

# 访问
print(fruits[0])     # apple
print(fruits[-1])    # cherry
print(fruits[1:3])   # ['banana', 'cherry']

# 修改
fruits[0] = "avocado"
fruits.append("date")
fruits.insert(1, "blueberry")
fruits.extend(["elderberry", "fig"])
fruits += ["grape"]

# 删除
fruits.remove("banana")     # 按值删除
popped = fruits.pop()       # 弹出末尾元素
popped = fruits.pop(0)      # 弹出指定索引
del fruits[1:3]             # 删除切片
fruits.clear()              # 清空

7.1.2 常用方法

numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5]

# 查找
print(numbers.index(5))       # 4(首次出现)
print(numbers.count(1))       # 2
print(5 in numbers)           # True

# 排序
numbers.sort()                # 原地排序
numbers.sort(reverse=True)    # 降序
numbers.sort(key=lambda x: -x)  # 自定义排序

sorted_nums = sorted(numbers)   # 返回新列表,不修改原列表

# 反转
numbers.reverse()             # 原地反转
reversed_nums = numbers[::-1]  # 返回新列表

# 长度
print(len(numbers))           # 9

7.1.3 列表解包

# 基本解包
a, b, c = [1, 2, 3]

# 星号解包
first, *middle, last = [1, 2, 3, 4, 5]
# first=1, middle=[2, 3, 4], last=5

# 函数参数解包
def add(a, b, c):
    return a + b + c

print(add(*[1, 2, 3]))  # 6

7.2 元组(tuple)

7.2.1 基本操作

# 创建
point = (3, 4)
single = (42,)     # 单元素元组需要逗号
empty = ()
from_list = tuple([1, 2, 3])

# 访问
print(point[0])     # 3
print(point[1:])    # (4,)
print(len(point))   # 2

# 不可变
# point[0] = 10  # ❌ TypeError

# 解包
x, y = point

# 命名元组更实用(见下文)

7.2.2 元组 vs 列表

特性元组列表
可变性不可变可变
语法(1, 2, 3)[1, 2, 3]
性能更快,内存更少较慢
可作为字典键
可作为集合元素
适用场景固定结构数据动态集合

7.3 字典(dict)

7.3.1 基本操作

# 创建
person = {"name": "Alice", "age": 30, "city": "北京"}
from_keys = dict.fromkeys(["a", "b", "c"], 0)  # {'a': 0, 'b': 0, 'c': 0}
from_pairs = dict([("a", 1), ("b", 2)])
empty = {}

# 访问
print(person["name"])              # Alice
print(person.get("email", "N/A")) # N/A(安全访问)

# 修改
person["age"] = 31
person.update({"city": "上海", "phone": "123456"})
person.setdefault("email", "[email protected]")  # 键不存在时设置

# 删除
del person["phone"]
email = person.pop("email", None)  # 安全删除
person.clear()

# 遍历
user = {"name": "Alice", "age": 30, "city": "北京"}

for key in user:
    print(key)

for key, value in user.items():
    print(f"{key}: {value}")

for value in user.values():
    print(value)

# 检查键
print("name" in user)  # True

7.3.2 字典合并

# Python 3.9+ 合并运算符
defaults = {"color": "red", "size": 10, "count": 1}
overrides = {"color": "blue", "count": 5}

# | 运算符(创建新字典)
result = defaults | overrides
# {'color': 'blue', 'size': 10, 'count': 5}

# |= 运算符(原地更新)
defaults |= overrides

# update 方法(更早版本兼容)
merged = {**defaults, **overrides}

7.3.3 字典推导

# 基本推导
squares = {x: x**2 for x in range(6)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# 过滤
prices = {"apple": 5.0, "banana": 3.0, "cherry": 8.0}
expensive = {k: v for k, v in prices.items() if v > 4}
# {'apple': 5.0, 'cherry': 8.0}

# 键值互换
original = {"a": 1, "b": 2}
inverted = {v: k for k, v in original.items()}
# {1: 'a', 2: 'b'}

7.3.4 defaultdict 和 Counter

from collections import defaultdict, Counter

# defaultdict:自动初始化默认值
word_count = defaultdict(int)
for word in ["apple", "banana", "apple", "cherry", "banana", "apple"]:
    word_count[word] += 1
# {'apple': 3, 'banana': 2, 'cherry': 1}

# 分组
students = [
    {"name": "Alice", "grade": "A"},
    {"name": "Bob", "grade": "B"},
    {"name": "Charlie", "grade": "A"},
]
by_grade = defaultdict(list)
for s in students:
    by_grade[s["grade"]].append(s["name"])
# {'A': ['Alice', 'Charlie'], 'B': ['Bob']}

# Counter:计数器
text = "hello world"
counter = Counter(text)
print(counter.most_common(3))  # [('l', 3), ('o', 2), ('h', 1)]

# Counter 运算
c1 = Counter(a=3, b=1)
c2 = Counter(a=1, b=2)
print(c1 + c2)  # Counter({'a': 4, 'b': 3})
print(c1 - c2)  # Counter({'a': 2})

7.4 集合(set)

7.4.1 基本操作

# 创建
fruits = {"apple", "banana", "cherry"}
from_list = set([1, 2, 2, 3, 3])  # {1, 2, 3}
empty = set()  # 注意:{} 创建空字典,不是空集合

# 添加和删除
fruits.add("date")
fruits.discard("banana")  # 不存在也不报错
fruits.remove("apple")    # 不存在会报 KeyError

# 集合运算
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

print(a | b)   # 并集: {1, 2, 3, 4, 5, 6}
print(a & b)   # 交集: {3, 4}
print(a - b)   # 差集: {1, 2}
print(a ^ b)   # 对称差: {1, 2, 5, 6}

# 子集和超集
print({1, 2} <= {1, 2, 3})  # True(子集)
print({1, 2, 3} >= {1, 2})  # True(超集)

7.4.2 集合推导

squares = {x**2 for x in range(-5, 6)}
# {0, 1, 4, 9, 16, 25}

# 去重
words = ["hello", "world", "hello", "python"]
unique = {w.lower() for w in words}

7.5 NamedTuple

from collections import namedtuple

# 创建类型
Point = namedtuple("Point", ["x", "y"])

# 使用
p = Point(3, 4)
print(p.x, p.y)      # 3 4
print(p[0], p[1])     # 3 4(也支持索引)

# 解包
x, y = p

# 不可变
# p.x = 10  # ❌ AttributeError

# 带默认值
Point3D = namedtuple("Point3D", "x y z", defaults=[0, 0, 0])
p = Point3D(1, 2)
print(p)  # Point3D(x=1, y=2, z=0)

typing.NamedTuple(推荐)

from typing import NamedTuple

class Point(NamedTuple):
    x: float
    y: float
    label: str = ""

p = Point(3.0, 4.0, "origin")
print(p.x, p.y, p.label)

7.6 数据类(dataclass)

7.6.1 基本用法

from dataclasses import dataclass, field

@dataclass
class User:
    name: str
    age: int
    email: str = ""
    tags: list[str] = field(default_factory=list)

# 自动生成 __init__, __repr__, __eq__
user = User("Alice", 30, "[email protected]")
print(user)  # User(name='Alice', age=30, email='[email protected]', tags=[])

# 比较
user2 = User("Alice", 30, "[email protected]")
print(user == user2)  # True(值比较,非引用比较)

7.6.2 dataclass 高级特性

@dataclass(frozen=True)  # 不可变
class Config:
    host: str = "localhost"
    port: int = 8080

config = Config()
# config.port = 9090  # ❌ FrozenInstanceError

@dataclass(order=True)  # 支持排序
class Student:
    name: str
    score: float

students = [Student("Bob", 85), Student("Alice", 92)]
sorted_students = sorted(students, reverse=True)

@dataclass
class Product:
    name: str
    price: float = field(compare=False)  # 排序时不比较该字段
    sku: str = field(repr=False, hash=False)  # 不显示、不参与哈希

7.6.3 继承

@dataclass
class Animal:
    name: str
    sound: str

@dataclass
class Dog(Animal):
    breed: str = "Unknown"

dog = Dog("Rex", "Woof", "German Shepherd")
print(dog)  # Dog(name='Rex', sound='Woof', breed='German Shepherd')

7.7 deque(双端队列)

from collections import deque

# 创建
dq = deque([1, 2, 3])

# 两端操作
dq.append(4)         # 右端添加
dq.appendleft(0)     # 左端添加
dq.pop()             # 右端弹出
dq.popleft()         # 左端弹出

# 旋转
dq = deque([1, 2, 3, 4, 5])
dq.rotate(2)         # 右旋: deque([4, 5, 1, 2, 3])
dq.rotate(-2)        # 左旋: deque([1, 2, 3, 4, 5])

# 有限长度
recent = deque(maxlen=3)
for i in range(5):
    recent.append(i)
    print(list(recent))
# [0]
# [0, 1]
# [0, 1, 2]
# [1, 2, 3]
# [2, 3, 4]

7.8 数据结构选型

需求推荐数据结构理由
有序集合,频繁增删list灵活的增删操作
不可变序列tuple更快,可哈希
键值映射dictO(1) 查找
默认值字典defaultdict避免 KeyError
计数Counter内置计数功能
去重setO(1) 成员检查
两端操作dequeO(1) 两端操作
不可变命名字段NamedTuple类型安全,内存少
可变命名字段dataclass自动生成方法

7.9 注意事项

🔴 注意

  • {} 创建空字典,set() 创建空集合
  • 字典键必须是可哈希(hashable)的类型
  • dataclass 默认生成的 __hash__ 在有可变字段时不可用
  • defaultdictdefault_factory 每次调用都会创建新的默认值对象

💡 提示

  • 需要频繁查找时,使用 setdict(O(1))而非 list(O(n))
  • 需要维护插入顺序时,Python 3.7+ 的 dict 已经保证有序
  • 处理大量数据时考虑 array 模块或 NumPy 数组
  • dataclass 是 Python 3.7+ 中替代 NamedTuple 的首选

📌 业务场景

from collections import Counter, defaultdict
from dataclasses import dataclass, field

@dataclass
class SalesReport:
    """销售报告。"""
    region: str
    total: float = 0.0
    orders: int = 0
    top_products: list[tuple[str, int]] = field(default_factory=list)

def analyze_sales(sales_data: list[dict]) -> list[SalesReport]:
    """分析销售数据。"""
    # 按区域分组
    by_region = defaultdict(list)
    for sale in sales_data:
        by_region[sale["region"]].append(sale)

    reports = []
    for region, sales in by_region.items():
        # 统计热销产品
        product_counter = Counter(s["product"] for s in sales)
        top_products = product_counter.most_common(3)

        report = SalesReport(
            region=region,
            total=sum(s["amount"] for s in sales),
            orders=len(sales),
            top_products=top_products,
        )
        reports.append(report)

    return sorted(reports, key=lambda r: r.total, reverse=True)

# 示例数据
sales = [
    {"region": "华东", "product": "手机", "amount": 5999},
    {"region": "华东", "product": "手机", "amount": 5999},
    {"region": "华北", "product": "电脑", "amount": 8999},
]

for report in analyze_sales(sales):
    print(f"{report.region}: ¥{report.total:.0f} ({report.orders}单)")
    print(f"  热销: {report.top_products}")

7.10 扩展阅读