MySQL 完全指南 / 第 15 章:监控体系
第 15 章:监控体系
没有监控就没有运维。本章覆盖 Performance Schema、慢查询日志和 Prometheus 监控方案。
15.1 Performance Schema
15.1.1 概述
Performance Schema 是 MySQL 内置的性能监控框架,收集服务器运行时的各种事件数据。
-- 查看是否开启
SHOW VARIABLES LIKE 'performance_schema';
-- 查看所有消费者(Consumer)
SELECT * FROM performance_schema.setup_consumers;
-- 查看所有检测点(Instrument)
SELECT * FROM performance_schema.setup_instruments
WHERE ENABLED = 'YES' LIMIT 20;
15.1.2 常用监控查询
-- ===================== 连接与线程 =====================
-- 当前连接数
SHOW STATUS LIKE 'Threads_connected';
-- 活跃线程数
SHOW STATUS LIKE 'Threads_running';
-- 连接历史
SHOW STATUS LIKE 'Connections';
-- 连接错误
SHOW STATUS LIKE 'Connection_errors%';
-- ===================== 查询统计 =====================
-- 全局查询统计
SHOW GLOBAL STATUS LIKE 'Com_select';
SHOW GLOBAL STATUS LIKE 'Com_insert';
SHOW GLOBAL STATUS LIKE 'Com_update';
SHOW GLOBAL STATUS LIKE 'Com_delete';
-- 每秒查询数(QPS)
-- 需要两次采样计算差值
-- ===================== InnoDB 监控 =====================
-- Buffer Pool 命中率
SHOW STATUS LIKE 'Innodb_buffer_pool_read%';
-- Innodb_buffer_pool_reads:从磁盘读取次数
-- Innodb_buffer_pool_read_requests:总请求次数
-- 命中率 = 1 - reads/read_requests
-- 行操作
SHOW STATUS LIKE 'Innodb_rows%';
-- Innodb_rows_read
-- Innodb_rows_inserted
-- Innodb_rows_updated
-- Innodb_rows_deleted
-- ===================== 表锁 =====================
SHOW STATUS LIKE 'Table_locks%';
-- Table_locks_immediate:立即获得锁
-- Table_locks_waited:需要等待锁
-- ===================== 临时表 =====================
SHOW STATUS LIKE 'Created_tmp%';
-- Created_tmp_disk_tables:磁盘临时表(应尽量少)
-- Created_tmp_tables:内存临时表
15.1.3 表 I/O 监控
-- 哪些表的 I/O 最多
SELECT
OBJECT_SCHEMA,
OBJECT_NAME,
COUNT_READ,
COUNT_WRITE,
COUNT_FETCH,
SUM_TIMER_READ / 1e12 AS read_time_s,
SUM_TIMER_WRITE / 1e12 AS write_time_s
FROM performance_schema.table_io_waits_summary_by_table
WHERE OBJECT_SCHEMA NOT IN ('mysql', 'performance_schema', 'sys')
ORDER BY (SUM_TIMER_READ + SUM_TIMER_WRITE) DESC
LIMIT 20;
15.1.4 语句统计
-- 最耗时的 SQL(按总时间排序)
SELECT
DIGEST_TEXT,
COUNT_STAR AS exec_count,
ROUND(SUM_TIMER_WAIT / 1e12, 2) AS total_time_s,
ROUND(AVG_TIMER_WAIT / 1e12, 4) AS avg_time_s,
SUM_ROWS_EXAMINED,
SUM_ROWS_SENT,
FIRST_SEEN,
LAST_SEEN
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME = 'myapp'
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 20;
-- 扫描行数远大于返回行数的查询(可能缺索引)
SELECT
DIGEST_TEXT,
COUNT_STAR,
SUM_ROWS_EXAMINED,
SUM_ROWS_SENT,
ROUND(SUM_ROWS_EXAMINED / NULLIF(SUM_ROWS_SENT, 0), 2) AS examine_ratio
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_ROWS_EXAMINED > 0
AND SUM_ROWS_EXAMINED > SUM_ROWS_SENT * 100
ORDER BY SUM_ROWS_EXAMINED DESC
LIMIT 10;
15.2 sys Schema
sys Schema 是 Performance Schema 的友好视图层。
-- 查看最耗时的语句
SELECT * FROM sys.statements_with_runtimes_in_95th_percentile LIMIT 10;
-- 查看全表扫描的语句
SELECT * FROM sys.statements_with_full_table_scans LIMIT 10;
-- 查看冗余索引
SELECT * FROM sys.schema_redundant_indexes;
-- 查看未使用的索引
SELECT * FROM sys.schema_unused_indexes WHERE object_schema = 'myapp';
-- 查看表大小
SELECT * FROM sys.schema_table_statistics WHERE table_schema = 'myapp';
-- 查看等待事件
SELECT * FROM sys.waits_global_by_latency LIMIT 10;
-- 查看 I/O 最多的文件
SELECT * FROM sys.io_global_by_file_by_bytes LIMIT 10;
-- 查看内存使用
SELECT * FROM sys.memory_global_total;
-- 用户连接统计
SELECT * FROM sys.user_summary;
-- 查看冗余索引和重复索引
SELECT * FROM sys.schema_redundant_indexes\G
15.3 SHOW STATUS 速查
-- 连接相关
SHOW GLOBAL STATUS LIKE 'Threads_connected'; -- 当前连接数
SHOW GLOBAL STATUS LIKE 'Threads_running'; -- 活跃线程数
SHOW GLOBAL STATUS LIKE 'Max_used_connections'; -- 历史最大连接数
SHOW GLOBAL STATUS LIKE 'Aborted_connects'; -- 连接失败次数
SHOW GLOBAL STATUS LIKE 'Aborted_clients'; -- 异常断开的客户端
-- 查询相关
SHOW GLOBAL STATUS LIKE 'Questions'; -- 总查询数
SHOW GLOBAL STATUS LIKE 'Slow_queries'; -- 慢查询数
-- InnoDB 相关
SHOW GLOBAL STATUS LIKE 'Innodb_row_lock%'; -- 行锁统计
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool%'; -- Buffer Pool 统计
SHOW GLOBAL STATUS LIKE 'Innodb_deadlocks'; -- 死锁次数
-- 临时表
SHOW GLOBAL STATUS LIKE 'Created_tmp_disk_tables'; -- 磁盘临时表
SHOW GLOBAL STATUS LIKE 'Created_tmp_tables'; -- 总临时表
15.4 Prometheus + Grafana 监控
15.4.1 mysqld_exporter 安装
# 下载
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.16.0/mysqld_exporter-0.16.0.linux-amd64.tar.gz
tar xzf mysqld_exporter-0.16.0.linux-amd64.tar.gz
mv mysqld_exporter-0.16.0.linux-amd64/mysqld_exporter /usr/local/bin/
# 创建监控用户
mysql -uroot -p -e "
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'Export!Pass123';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;"
# 配置
cat > /etc/.mysqld_exporter.cnf << EOF
[client]
user=exporter
password=Export!Pass123
EOF
# 启动
mysqld_exporter --config.my-cnf=/etc/.mysqld_exporter.cnf --web.listen-address=:9104
15.4.2 Prometheus 配置
# prometheus.yml
scrape_configs:
- job_name: 'mysql'
static_configs:
- targets: ['192.168.1.100:9104']
labels:
instance: 'mysql-master'
- targets: ['192.168.1.101:9104']
labels:
instance: 'mysql-slave-1'
15.4.3 关键监控指标
| 指标 | PromQL | 说明 |
|---|---|---|
| 连接数 | mysql_global_status_threads_connected | 当前连接数 |
| QPS | rate(mysql_global_status_queries[5m]) | 每秒查询数 |
| 慢查询 | rate(mysql_global_status_slow_queries[5m]) | 慢查询速率 |
| Buffer Pool 命中率 | 1 - rate(mysql_global_status_innodb_buffer_pool_reads[5m]) / rate(mysql_global_status_innodb_buffer_pool_read_requests[5m]) | 命中率 |
| 复制延迟 | mysql_slave_status_seconds_behind_master | 从库延迟秒数 |
| 死锁次数 | mysql_global_status_innodb_deadlocks | 死锁累计次数 |
| 磁盘临时表比 | rate(mysql_global_status_created_tmp_disk_tables[5m]) / rate(mysql_global_status_created_tmp_tables[5m]) | 磁盘临时表比例 |
15.5 告警规则建议
# Prometheus 告警规则 (mysql_alerts.yml)
groups:
- name: mysql_alerts
rules:
# 连接数过高
- alert: MySQLTooManyConnections
expr: mysql_global_status_threads_connected > 200
for: 5m
labels:
severity: warning
# 慢查询过多
- alert: MySQLSlowQueriesHigh
expr: rate(mysql_global_status_slow_queries[5m]) > 10
for: 5m
labels:
severity: warning
# Buffer Pool 命中率低
- alert: MySQLBufferPoolHitRateLow
expr: (1 - rate(mysql_global_status_innodb_buffer_pool_reads[5m]) / rate(mysql_global_status_innodb_buffer_pool_read_requests[5m])) < 0.99
for: 10m
labels:
severity: critical
# 复制延迟
- alert: MySQLReplicationLag
expr: mysql_slave_status_seconds_behind_master > 30
for: 5m
labels:
severity: critical
# 复制中断
- alert: MySQLReplicationStopped
expr: mysql_slave_status_slave_io_running == 0 or mysql_slave_status_slave_sql_running == 0
for: 1m
labels:
severity: critical
# 死锁
- alert: MySQLDeadlocks
expr: increase(mysql_global_status_innodb_deadlocks[5m]) > 5
for: 5m
labels:
severity: warning
15.6 监控 Dashboard 推荐
Grafana 常用 MySQL Dashboard:
| Dashboard ID | 名称 | 说明 |
|---|---|---|
| 7362 | MySQL Overview | 综合监控 |
| 14057 | MySQL Dashboard | 简洁版 |
| 11323 | MySQL InnoDB Metrics | InnoDB 深入监控 |
| 7991 | MySQL Replication | 复制监控 |
# 在 Grafana 中导入 Dashboard
# 1. 打开 Grafana → + → Import
# 2. 输入 Dashboard ID(如 7362)
# 3. 选择 Prometheus 数据源
业务场景
场景 1:日常巡检脚本
-- 日常巡检 SQL 脚本
-- 1. 连接状态
SELECT 'Connections' AS metric,
VARIABLE_VALUE AS value
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Threads_connected';
-- 2. Buffer Pool 命中率
SELECT 'Buffer Pool Hit Rate' AS metric,
ROUND((1 - (
SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads'
) / NULLIF((
SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests'
), 0)) * 100, 2) AS value;
-- 3. 慢查询数
SELECT 'Slow Queries' AS metric, VARIABLE_VALUE AS value
FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Slow_queries';
-- 4. 复制状态
SHOW REPLICA STATUS\G
-- 5. 磁盘空间
SELECT TABLE_SCHEMA,
ROUND(SUM(DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 0) AS total_mb
FROM information_schema.TABLES
GROUP BY TABLE_SCHEMA ORDER BY total_mb DESC;
-- 6. 长事务
SELECT * FROM information_schema.INNODB_TRX
WHERE TIMESTAMPDIFF(SECOND, trx_started, NOW()) > 60;
-- 7. 锁等待
SELECT * FROM performance_schema.data_lock_waits;