强曰为道
与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

Memcached 完全指南 / 第 5 章:数据类型与序列化

第 5 章:数据类型与序列化

5.1 Memcached 只有一种数据类型

与 Redis 支持 String/Hash/List/Set/ZSet 不同,Memcached 只支持二进制安全的 String

┌──────────────────────────────┐
│       Memcached Value        │
│                              │
│   本质:一段二进制字节序列    │
│   最大:1MB(默认)          │
│   编码:客户端决定           │
│                              │
└──────────────────────────────┘

这意味着存入任何数据前,你都需要先序列化(Serialize);取出后需要反序列化(Deserialize)。

# 直接存储原始字符串
set greeting 0 0 12
Hello World!
# STORED

# 存储 JSON 序列化的对象
set user:1001 0 0 27
{"name":"Alice","age":30}
# STORED

# 存储二进制数据
set image:thumb 0 0 3
<binary data>
# STORED

5.2 序列化方案对比

主流序列化格式

格式可读性体积速度Schema跨语言
JSON★★★★★★★★★★★★★★★★
MessagePack★★★★★★★★★★★★★★★
Protobuf★★★★★★★★★★必需★★★★★
Avro★★★★★★★★★必需★★★★
BSON★★★★★★★★★★★
PHP serialize★★★★☆(PHP Only)
Java Serializable★★★★☆(Java Only)
XML★★★★★★★★可选★★★★★

详细对比

维度JSONMessagePackProtobuf
人类可读
压缩率基准 1x0.5-0.7x0.3-0.5x
编码速度基准 1x2-5x3-10x
解码速度基准 1x2-5x3-10x
字段变更灵活灵活需更新 Schema
调试便利非常方便需工具需工具
适用场景通用高性能缓存微服务通信

5.3 JSON 序列化(推荐入门)

JSON 是最通用的选择,几乎所有语言都有原生支持。

Python 示例

import json
import memcache

mc = memcache.Client(['localhost:11211'])

# 序列化存储
user = {
    "id": 1001,
    "name": "Alice",
    "email": "[email protected]",
    "roles": ["admin", "editor"],
    "profile": {
        "age": 30,
        "city": "Beijing"
    }
}
mc.set(f"user:{user['id']}", json.dumps(user), time=3600)

# 反序列化读取
data = mc.get(f"user:{user['id']}")
if data:
    user_obj = json.loads(data)
    print(user_obj["name"])  # Alice
    print(user_obj["roles"]) # ['admin', 'editor']

PHP 示例

<?php
$mc = new Memcached();
$mc->addServer('localhost', 11211);

// 序列化存储
$product = [
    'id'    => 5001,
    'name'  => '机械键盘',
    'price' => 599.00,
    'tags'  => ['外设', '键盘', '机械'],
];
$mc->set("product:{$product['id']}", json_encode($product), 3600);

// 反序列化读取
$json = $mc->get("product:5001");
if ($json !== false) {
    $productObj = json_decode($json, true);  // true = 关联数组
    echo $productObj['name'];  // 机械键盘
}

Go 示例

package main

import (
    "encoding/json"
    "fmt"
    "github.com/bradfitz/gomemcache/memcache"
)

type User struct {
    ID    int      `json:"id"`
    Name  string   `json:"name"`
    Email string   `json:"email"`
    Roles []string `json:"roles"`
}

func main() {
    mc := memcache.New("localhost:11211")

    // 序列化存储
    user := User{
        ID:    1001,
        Name:  "Alice",
        Email: "[email protected]",
        Roles: []string{"admin", "editor"},
    }
    data, _ := json.Marshal(user)
    mc.Set(&memcache.Item{
        Key:        fmt.Sprintf("user:%d", user.ID),
        Value:      data,
        Expiration: 3600,
    })

    // 反序列化读取
    item, err := mc.Get("user:1001")
    if err == nil {
        var u User
        json.Unmarshal(item.Value, &u)
        fmt.Println(u.Name)  // Alice
    }
}

Java 示例

import com.fasterxml.jackson.databind.ObjectMapper;
import net.spy.memcached.MemcachedClient;
import java.net.InetSocketAddress;

public class MemcachedJsonExample {
    private static final ObjectMapper mapper = new ObjectMapper();

    public static void main(String[] args) throws Exception {
        MemcachedClient mc = new MemcachedClient(
            new InetSocketAddress("localhost", 11211)
        );

        // 序列化存储
        Map<String, Object> order = Map.of(
            "id", 10001,
            "userId", 1001,
            "amount", 299.50,
            "items", List.of("键盘", "鼠标")
        );
        String json = mapper.writeValueAsString(order);
        mc.set("order:10001", 3600, json);

        // 反序列化读取
        Object value = mc.get("order:10001");
        if (value != null) {
            Map<String, Object> orderMap = mapper.readValue(
                (String) value, Map.class
            );
            System.out.println(orderMap.get("amount")); // 299.50
        }
    }
}

5.4 MessagePack 序列化(推荐高性能)

MessagePack 是"二进制版 JSON",更小更快。

性能基准

测试数据:10000 个 User 对象,每个包含 id/name/email/roles

JSON:         编码 52ms  解码 68ms  平均大小 128B
MessagePack:  编码 18ms  解码 22ms  平均大小 72B   (压缩 44%)

Python 示例

import msgpack
import memcache

mc = memcache.Client(['localhost:11211'])

# MessagePack 序列化
user = {"id": 1001, "name": "Alice", "tags": ["admin", "editor"]}
packed = msgpack.packb(user, use_bin_type=True)
mc.set("user:1001:msgpack", packed, time=3600)

# MessagePack 反序列化
raw = mc.get("user:1001:msgpack")
if raw:
    user_obj = msgpack.unpackb(raw, raw=False)
    print(user_obj["name"])  # Alice

Go 示例

import (
    "github.com/bradfitz/gomemcache/memcache"
    "github.com/vmihailenco/msgpack/v5"
)

type User struct {
    ID    int      `msgpack:"id"`
    Name  string   `msgpack:"name"`
    Roles []string `msgpack:"roles"`
}

// 存储
func setUser(mc *memcache.Client, user User) error {
    data, err := msgpack.Marshal(user)
    if err != nil {
        return err
    }
    return mc.Set(&memcache.Item{
        Key:        fmt.Sprintf("user:%d", user.ID),
        Value:      data,
        Expiration: 3600,
    })
}

// 读取
func getUser(mc *memcache.Client, id int) (*User, error) {
    item, err := mc.Get(fmt.Sprintf("user:%d", id))
    if err != nil {
        return nil, err
    }
    var user User
    if err := msgpack.Unmarshal(item.Value, &user); err != nil {
        return nil, err
    }
    return &user, nil
}

5.5 Protobuf 序列化(推荐微服务)

Protocol Buffers 需要预定义 Schema,适合有固定结构的数据。

Schema 定义

// user.proto
syntax = "proto3";

message User {
    int32  id       = 1;
    string name     = 2;
    string email    = 3;
    repeated string roles = 4;
    Profile profile = 5;
}

message Profile {
    int32 age  = 1;
    string city = 2;
}

Python 示例

# 生成代码: protoc --python_out=. user.proto
import user_pb2
import memcache

mc = memcache.Client(['localhost:11211'])

# 序列化存储
user = user_pb2.User()
user.id = 1001
user.name = "Alice"
user.email = "[email protected]"
user.roles.extend(["admin", "editor"])
user.profile.age = 30
user.profile.city = "Beijing"

mc.set(f"user:{user.id}", user.SerializeToString(), time=3600)

# 反序列化读取
raw = mc.get("user:1001")
if raw:
    user_obj = user_pb2.User()
    user_obj.ParseFromString(raw)
    print(user_obj.name)        # Alice
    print(user_obj.profile.city) # Beijing

5.6 带版本标记的序列化

推荐做法: 在 Value 中嵌入版本号,方便数据迁移。

import json

def serialize(data, version=1):
    """带版本号的序列化"""
    wrapper = {
        "v": version,
        "data": data
    }
    return json.dumps(wrapper).encode('utf-8')

def deserialize(raw):
    """带版本号的反序列化"""
    wrapper = json.loads(raw.decode('utf-8'))
    version = wrapper.get("v", 1)
    data = wrapper["data"]

    # 版本迁移逻辑
    if version == 1:
        data = migrate_v1_to_v2(data)
    if version == 2:
        data = migrate_v2_to_v3(data)
    return data

# 存储
mc.set("user:1001", serialize(user_data, version=2), time=3600)

5.7 大 Value 处理策略

Memcached 默认最大 Value 为 1MB。超过时需要拆分。

拆分策略

import json
import memcache

mc = memcache.Client(['localhost:11211'])
MAX_VALUE_SIZE = 950 * 1024  # 950KB(留余量)

def set_large(key, data, ttl=3600):
    """存储大 Value,自动拆分"""
    serialized = json.dumps(data).encode('utf-8')

    if len(serialized) <= MAX_VALUE_SIZE:
        mc.set(key, serialized, time=ttl)
        return 1

    # 拆分为多个分片
    chunks = []
    for i in range(0, len(serialized), MAX_VALUE_SIZE):
        chunks.append(serialized[i:i + MAX_VALUE_SIZE])

    # 存储分片元信息
    meta = {"total": len(chunks), "chunks": []}
    for idx, chunk in enumerate(chunks):
        chunk_key = f"{key}:chunk:{idx}"
        mc.set(chunk_key, chunk, time=ttl)
        meta["chunks"].append(chunk_key)

    mc.set(f"{key}:meta", json.dumps(meta).encode('utf-8'), time=ttl)
    return len(chunks)

def get_large(key):
    """读取大 Value,自动合并"""
    raw = mc.get(key)
    if raw:
        return json.loads(raw.decode('utf-8'))

    # 检查是否有分片元信息
    meta_raw = mc.get(f"{key}:meta")
    if not meta_raw:
        return None

    meta = json.loads(meta_raw.decode('utf-8'))
    # 批量获取所有分片
    chunk_keys = meta["chunks"]
    chunks = mc.get_multi(chunk_keys)

    data = b""
    for ck in chunk_keys:
        if ck in chunks:
            data += chunks[ck]
        else:
            raise Exception(f"Missing chunk: {ck}")

    return json.loads(data.decode('utf-8'))

5.8 压缩策略

对于大于一定阈值的数据,在序列化后进行压缩:

import json
import zlib
import memcache

mc = memcache.Client(['localhost:11211'])
COMPRESS_THRESHOLD = 1024  # 1KB 以上压缩

def set_with_compress(key, data, ttl=3600):
    serialized = json.dumps(data).encode('utf-8')
    if len(serialized) > COMPRESS_THRESHOLD:
        # 设置标志位:0x01 = 压缩
        compressed = zlib.compress(serialized, level=6)
        mc.set(key, b'\x01' + compressed, time=ttl)
    else:
        mc.set(key, b'\x00' + serialized, time=ttl)

def get_with_decompress(key):
    raw = mc.get(key)
    if not raw:
        return None
    flag = raw[0]
    payload = raw[1:]
    if flag == 0x01:
        payload = zlib.decompress(payload)
    return json.loads(payload.decode('utf-8'))

5.9 序列化方案选型指南

你的场景是什么?
│
├── 需要调试 / 人工查看数据?
│   └── JSON ✓
│
├── 对性能要求极高 / Value 较大?
│   └── MessagePack ✓
│
├── 微服务间通信 / 需要强 Schema?
│   └── Protobuf ✓
│
├── 仅限 PHP 项目?
│   └── PHP igbinary / msgpack 扩展 ✓
│
└── 不确定?
    └── 先用 JSON,后续按需迁移 ✓

扩展阅读

小结

要点内容
核心事实Memcached 只存二进制字节,序列化是客户端的事
通用选择JSON:可读性好,调试方便
高性能选择MessagePack:体积小、速度快
大 Value拆分为多个 Key 存储
推荐实践Value 中嵌入版本号,方便后续迁移