PostgreSQL 完全指南 / 21 - 排错指南
第 21 章 · 排错指南
本章汇总 PostgreSQL 常见错误、连接问题、锁争用和 WAL 堆积的排查方法。
21.1 连接问题
FATAL: too many connections
-- 查看当前连接数
SELECT count(*) FROM pg_stat_activity;
SHOW max_connections;
-- 查看连接分布
SELECT datname, usename, client_addr, state, count(*)
FROM pg_stat_activity
GROUP BY datname, usename, client_addr, state
ORDER BY count(*) DESC;
-- 终止空闲连接
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
AND query_start < now() - INTERVAL '10 minutes';
-- 终止空闲事务
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND state_change < now() - INTERVAL '5 minutes';
💡 技巧:根本解决方案是使用连接池(PgBouncer)。
FATAL: password authentication failed
# 检查 pg_hba.conf
sudo cat /etc/postgresql/17/main/pg_hba.conf
# 确认认证方式
# host all all 127.0.0.1/32 scram-sha-256
# 重置密码
sudo -u postgres psql
ALTER USER postgres PASSWORD 'newpassword';
# 重新加载配置
SELECT pg_reload_conf();
FATAL: database “xxx” does not exist
-- 检查数据库名
\l
-- 可能是大小写问题(PG 默认小写)
-- CREATE DATABASE "MyDB" 会创建大小写敏感的库名
-- 连接时需要加引号:psql -d "MyDB"
could not connect to server: Connection refused
# 检查 PG 是否运行
sudo systemctl status postgresql
# 检查监听地址
sudo -u postgres psql -c "SHOW listen_addresses;"
# 如果是 localhost,远程连接会失败
# 改为 '*' 或具体 IP
# 检查端口
sudo -u postgres psql -c "SHOW port;"
# 检查防火墙
sudo ufw status
sudo iptables -L -n
21.2 锁问题排查
-- 查看所有锁
SELECT
a.pid,
a.usename,
a.query,
l.locktype,
l.mode,
l.granted,
l.relation::regclass AS table_name
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE l.relation IS NOT NULL
ORDER BY l.pid;
-- 查找阻塞关系
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks gl ON gl.locktype = bl.locktype
AND gl.relation = bl.relation AND gl.granted
JOIN pg_stat_activity blocking ON blocking.pid = gl.pid;
-- 终止阻塞进程
SELECT pg_terminate_backend(blocking_pid);
21.3 WAL 堆积
-- 查看 WAL 文件数量和大小
SELECT
count(*) AS wal_count,
pg_size_pretty(sum(size)) AS total_size
FROM pg_ls_waldir();
-- 检查复制槽是否导致堆积
SELECT
slot_name,
active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained
FROM pg_replication_slots;
-- 删除不活跃的复制槽
SELECT pg_drop_replication_slot('slot_name');
-- 检查归档状态
SELECT * FROM pg_stat_archiver;
-- 如果 last_failed_time 比 last_archived_time 新,归档可能有问题
-- 手动触发 checkpoint
CHECKPOINT;
21.4 表膨胀
-- 检查膨胀
SELECT
relname,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
n_live_tup,
n_dead_tup,
round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
-- 手动 VACUUM
VACUUM VERBOSE mytable;
-- VACUUM FULL(锁表,完全回收空间)
VACUUM FULL mytable;
-- 使用 pg_repack(不锁表重建)
-- pg_repack -d mydb -t mytable
21.5 慢查询排查
-- 找到当前慢查询
SELECT pid, now() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;
-- 分析查询计划
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
-- 常见慢查询原因
-- 1. 缺少索引 → 创建索引
-- 2. 统计信息过时 → ANALYZE table
-- 3. work_mem 不足 → 增加 work_mem
-- 4. 表膨胀 → VACUUM
-- 5. 锁等待 → 检查锁
21.6 磁盘空间不足
-- 查看数据库大小
SELECT
datname,
pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;
-- 查看表大小 Top 20
SELECT
relname,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 20;
-- 清理方案
-- 1. VACUUM FULL(锁表)
-- 2. pg_repack(在线)
-- 3. 清理旧数据
-- 4. 清理 WAL:删除不活跃的复制槽、检查归档
21.7 事务 ID 回卷预警
-- 检查各数据库的事务 ID 年龄
SELECT
datname,
age(datfrozenxid) AS xid_age,
2147483647 - age(datfrozenxid) AS remaining
FROM pg_database
ORDER BY xid_age DESC;
-- 如果 remaining < 100000000,需要紧急 VACUUM FREEZE
VACUUM FREEZE VERBOSE mytable;
-- 检查表级别
SELECT
relname,
age(relfrozenxid) AS xid_age
FROM pg_class
WHERE relkind = 'r'
ORDER BY xid_age DESC
LIMIT 20;
⚠️ 注意事项:当 remaining 接近 0 时,PostgreSQL 会强制关闭数据库!必须确保 autovacuum 正常运行。
业务场景
| 问题 | 排查方向 |
|---|---|
| 连接失败 | pg_hba.conf + 防火墙 + 监听地址 |
| 查询突然变慢 | EXPLAIN ANALYZE + 统计信息 |
| CPU 飙高 | pg_stat_statements 找高频查询 |
| 磁盘满 | 大表/WAL 文件/复制槽 |
| 事务挂起 | 锁等待 + 空闲事务 |