AgensGraph 完全指南 / 第 05 章:Cypher 进阶
第 05 章:Cypher 进阶
5.1 路径(Path)
路径是 Cypher 中最强大的概念之一,它表示图中顶点和边的有序交替序列。
5.1.1 路径的基本概念
路径的结构:
(v₁)─[e₁]─(v₂)─[e₂]─(v₃)─[e₃]─(v₄)
│ │
└──────────── 路径 P ──────────────────┘
路径长度 = 边的数量 = 3
路径中的顶点数 = 4
5.1.2 固定长度路径
-- 长度为 2 的路径(朋友的朋友)
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->()-[:KNOWS]->(c:Person)
RETURN c.name;
-- 长度为 3 的路径
MATCH (a:Person {name: 'Alice'})-[:KNOWS*3]->(d:Person)
RETURN d.name;
-- 绑定路径变量
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*2]->(c:Person)
RETURN p, length(p) AS path_length, nodes(p) AS path_nodes;
5.1.3 变长路径(Variable-Length Path)
-- 最少 1 跳,最多 3 跳
MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
RETURN DISTINCT b.name, b.age;
-- 0 到 N 跳(包含起点自身)
MATCH (a:Person {name: 'Alice'})-[:KNOWS*0..5]->(b:Person)
RETURN DISTINCT b.name;
-- 至少 2 跳(无上限)
MATCH (a:Person {name: 'Alice'})-[:KNOWS*2..]->(b:Person)
RETURN DISTINCT b.name;
| 语法 | 含义 | 示例 |
|---|
*N | 精确 N 跳 | [:KNOWS*3] |
*N..M | N 到 M 跳 | [:KNOWS*1..3] |
*N.. | 至少 N 跳 | [:KNOWS*2..] |
*..M | 最多 M 跳 | [:KNOWS*..5] |
* | 1 跳或更多 | [:KNOWS*] |
5.1.4 最短路径(Shortest Path)
-- 找到两个节点间的最短路径
MATCH p = shortestPath(
(a:Person {name: 'Alice'})-[:KNOWS*]-(b:Person {name: 'Dave'})
)
RETURN p, length(p) AS distance;
-- 所有最短路径
MATCH p = allShortestPaths(
(a:Person {name: 'Alice'})-[:KNOWS*]-(b:Person {name: 'Dave'})
)
RETURN p, length(p) AS distance;
-- 限制最大深度的最短路径
MATCH p = shortestPath(
(a:Person {name: 'Alice'})-[:KNOWS*..6]-(b:Person {name: 'Dave'})
)
RETURN p;
5.1.5 路径函数
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(c:Person)
RETURN
p AS full_path,
length(p) AS hops,
nodes(p) AS vertices,
relationships(p) AS edges,
startNode(head(relationships(p))) AS start,
endNode(last(relationships(p))) AS destination;
| 函数 | 返回值 | 说明 |
|---|
length(p) | 整数 | 路径长度(边数) |
nodes(p) | 顶点列表 | 路径中的所有顶点 |
relationships(p) | 边列表 | 路径中的所有边 |
startNode(r) | 顶点 | 边的起始顶点 |
endNode(r) | 顶点 | 边的终止顶点 |
head(list) | 元素 | 列表第一个元素 |
last(list) | 元素 | 列表最后一个元素 |
5.2 高级聚合
5.2.1 聚合函数详解
-- 统计技术部中各职级的人数和平均薪资
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department {name: '技术部'})
RETURN
e.title AS title,
count(e) AS headcount,
avg(e.salary) AS avg_salary,
min(e.salary) AS min_salary,
max(e.salary) AS max_salary,
sum(e.salary) AS total_cost,
stdev(e.salary) AS salary_stddev,
percentileCont(e.salary, 0.5) AS median_salary,
percentileDisc(e.salary, 0.9) AS p90_salary
ORDER BY avg_salary DESC;
| 聚合函数 | 说明 | 示例 |
|---|
count(x) | 计数 | count(e) |
count(*) | 总行数 | count(*) |
count(DISTINCT x) | 去重计数 | count(DISTINCT e.title) |
avg(x) | 平均值 | avg(e.salary) |
sum(x) | 总和 | sum(e.salary) |
min(x) | 最小值 | min(e.salary) |
max(x) | 最大值 | max(e.salary) |
stdev(x) | 标准差 | stdev(e.salary) |
stdevp(x) | 总体标准差 | stdevp(e.salary) |
percentileCont(x, p) | 连续百分位 | percentileCont(e.salary, 0.5) |
percentileDisc(x, p) | 离散百分位 | percentileDisc(e.salary, 0.9) |
collect(x) | 收集为列表 | collect(e.name) |
5.2.2 collect 聚合
collect() 将匹配的值收集为一个列表,在图查询中极为常用:
-- 收集每个部门的员工名单
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN
d.name AS department,
collect(e.name) AS employees,
count(e) AS headcount;
-- collect 与 DISTINCT 配合
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN
d.name AS department,
collect(DISTINCT e.title) AS unique_titles;
5.2.3 分组聚合(隐式 GROUP BY)
-- Cypher 中 RETURN 中的聚合函数会自动按非聚合字段分组
-- 等价于 SQL 的 GROUP BY
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN d.name AS dept, count(e) AS cnt, avg(e.salary) AS avg_sal;
-- 按多个字段分组
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
RETURN d.name AS dept, e.title AS title, count(e) AS cnt;
5.2.4 HAVING 等价操作
Cypher 没有显式的 HAVING 关键字,使用 WHERE 在 WITH 之后实现:
-- 找出平均薪资超过 20000 的部门
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d.name AS dept, avg(e.salary) AS avg_salary, count(e) AS cnt
WHERE avg_salary > 20000
RETURN dept, avg_salary, cnt
ORDER BY avg_salary DESC;
5.3 WITH — 管道操作
WITH 是 Cypher 中的"管道"操作符,类似于 Unix 的 | 管道,将前一步的结果传递给下一步。
5.3.1 基本用法
-- 分步查询:先找高薪员工,再找其所在部门
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WHERE e.salary > 20000
WITH e, d
RETURN e.name, e.salary, d.name AS department;
5.3.2 WITH 配合聚合过滤
-- 找出人数超过 2 人的部门及其平均薪资
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, count(e) AS emp_count, avg(e.salary) AS avg_sal
WHERE emp_count >= 2
RETURN d.name AS department, emp_count, round(avg_sal) AS avg_salary
ORDER BY emp_count DESC;
5.3.3 WITH 实现中间排序
-- 找出薪资最高的前 3 名员工,然后查询他们的部门
MATCH (e:Employee)
WITH e ORDER BY e.salary DESC LIMIT 3
MATCH (e)-[:BELONGS_TO]->(d:Department)
RETURN e.name, e.salary, d.name AS department;
5.3.4 WITH 传递多个变量
-- 多步骤管道处理
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, collect(e) AS employees, count(e) AS cnt
WHERE cnt > 1
WITH d, employees, cnt,
reduce(total = 0, emp IN employees | total + emp.salary) AS total_salary
RETURN d.name, cnt, total_salary, round(total_salary / cnt) AS avg_salary;
5.4 UNWIND — 列表展开
UNWIND 将列表展开为多行,是 collect() 的逆操作:
-- 将列表展开为多行
UNWIND [1, 2, 3, 4, 5] AS num
RETURN num;
-- 批量创建节点
UNWIND ['Alice', 'Bob', 'Carol', 'Dave'] AS name
CREATE (:Person {name: name, created: datetime()});
-- 批量导入场景
UNWIND [
{name: 'Alice', age: 30, city: '北京'},
{name: 'Bob', age: 28, city: '上海'},
{name: 'Carol', age: 32, city: '广州'}
] AS data
CREATE (:Person {
name: data.name,
age: data.age,
city: data.city
});
-- 与 collect 配合(展开后处理)
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, collect(e.name) AS names
UNWIND names AS employee_name
RETURN d.name, employee_name;
5.5 FOREACH — 循环操作
-- 为路径中的每个节点添加属性
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
FOREACH (n IN nodes(p) | SET n:Visited)
-- 为路径中的每条边设置属性
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
FOREACH (r IN relationships(p) | SET r.traversed = true)
5.6 子查询与 CALL
5.6.1 CALL 子查询
-- 使用 CALL 执行子查询
MATCH (d:Department)
CALL {
WITH d
MATCH (e:Employee)-[:BELONGS_TO]->(d)
RETURN count(e) AS emp_count, avg(e.salary) AS avg_salary
}
RETURN d.name, emp_count, avg_salary;
5.6.2 子查询实现 EXISTS 语义
-- 找出有下属的员工(类似 SQL EXISTS)
MATCH (e:Employee)
WHERE EXISTS {
MATCH (e)<-[:REPORTS_TO]-(sub:Employee)
}
RETURN e.name, e.title;
-- 找出没有下属的员工(类似 SQL NOT EXISTS)
MATCH (e:Employee)
WHERE NOT EXISTS {
MATCH (e)<-[:REPORTS_TO]-(sub:Employee)
}
RETURN e.name, e.title;
5.6.3 OPTIONAL MATCH(左连接语义)
-- 类似 SQL 的 LEFT JOIN
-- 即使没有匹配也返回左侧结果
MATCH (e:Employee)
OPTIONAL MATCH (e)-[:BELONGS_TO]->(d:Department)
RETURN e.name, COALESCE(d.name, '未分配') AS department;
5.7 CASE 条件表达式
-- 简单 CASE
MATCH (e:Employee)
RETURN e.name, e.salary,
CASE e.title
WHEN '技术总监' THEN '管理层'
WHEN '高级工程师' THEN '高级'
WHEN '工程师' THEN '中级'
ELSE '其他'
END AS level;
-- 搜索 CASE
MATCH (e:Employee)
RETURN e.name, e.salary,
CASE
WHEN e.salary >= 30000 THEN '高薪'
WHEN e.salary >= 20000 THEN '中等'
ELSE '基础'
END AS salary_grade;
5.8 高级列表操作
5.8.1 列表推导式
-- 列表过滤
MATCH (d:Department)<-[:BELONGS_TO]-(e:Employee)
RETURN d.name,
[emp IN collect(e) WHERE emp.salary > 20000 | emp.name] AS high_earners;
-- 列表变换
MATCH (e:Employee)
RETURN collect(e.name) AS names,
[name IN collect(e.name) | toUpper(name)] AS upper_names;
5.8.2 列表函数
| 函数 | 说明 | 示例 |
|---|
size(list) | 列表长度 | size(collect(n)) |
head(list) | 第一个元素 | head([1,2,3]) → 1 |
last(list) | 最后一个元素 | last([1,2,3]) → 3 |
tail(list) | 除第一个外 | tail([1,2,3]) → [2,3] |
reverse(list) | 反转 | reverse([1,2,3]) → [3,2,1] |
range(start, end, step) | 生成范围 | range(1,10,2) → [1,3,5,7,9] |
extract(x IN list | expr) | 提取 | extract(n IN [1,2,3] | n * 2) → [2,4,6] |
filter(x IN list WHERE cond) | 过滤 | filter(n IN [1,2,3,4] WHERE n > 2) → [3,4] |
reduce(accum = init, x IN list | expr) | 归约 | reduce(s=0, n IN [1,2,3] | s+n) → 6 |
5.8.3 reduce 聚合器
-- 使用 reduce 计算路径总权重
MATCH p = (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
RETURN
[n IN nodes(p) | n.name] AS path_names,
reduce(weight = 0, r IN relationships(p) | weight + r.weight) AS total_weight;
-- 使用 reduce 拼接字符串
MATCH (e:Employee)-[:BELONGS_TO]->(d:Department)
WITH d, collect(e.name) AS names
RETURN d.name,
reduce(s = '', name IN names | s + CASE WHEN s <> '' THEN ', ' ELSE '' END + name) AS employee_list;
5.9 高级业务场景:知识图谱推理
场景:构建药物交互知识图谱
-- 创建药物节点
CREATE (:Drug {name: '阿司匹林', category: '解热镇痛', dosage: '100mg'});
CREATE (:Drug {name: '华法林', category: '抗凝血', dosage: '5mg'});
CREATE (:Drug {name: '布洛芬', category: '解热镇痛', dosage: '400mg'});
CREATE (:Drug {name: '氯吡格雷', category: '抗血小板', dosage: '75mg'});
-- 创建疾病节点
CREATE (:Disease {name: '心血管疾病', severity: 'high'});
CREATE (:Disease {name: '头痛', severity: 'low'});
-- 创建药物关系
MATCH (a:Drug {name: '阿司匹林'}), (b:Drug {name: '华法林'})
CREATE (a)-[:INTERACTS_WITH {risk: 'high', effect: '增加出血风险'}]->(b);
MATCH (a:Drug {name: '阿司匹林'}), (b:Drug {name: '布洛芬'})
CREATE (a)-[:INTERACTS_WITH {risk: 'medium', effect: '降低阿司匹林效果'}]->(b);
MATCH (a:Drug {name: '阿司匹林'}), (d:Disease {name: '心血管疾病'})
CREATE (a)-[:TREATS]->(d);
MATCH (a:Drug {name: '布洛芬'}), (d:Disease {name: '头痛'})
CREATE (a)-[:TREATS]->(d);
查询:药物安全检查
-- 查找所有药物交互风险
MATCH (d1:Drug)-[r:INTERACTS_WITH]->(d2:Drug)
WHERE r.risk = 'high'
RETURN d1.name AS drug1, d2.name AS drug2, r.effect AS interaction_effect
ORDER BY r.risk;
查询:找出某疾病的所有可用药物及其交互
-- 针对心血管疾病,找出治疗药物及其与其他药物的交互
MATCH (drug:TREATS)->(disease:Disease {name: '心血管疾病'})
OPTIONAL MATCH (drug)-[r:INTERACTS_WITH]->(other:Drug)
RETURN drug.name AS treatment,
drug.dosage AS dosage,
COALESCE(other.name, '无已知交互') AS interacting_drug,
COALESCE(r.effect, '-') AS effect,
COALESCE(r.risk, '-') AS risk;
5.10 Cypher 高级操作符速查
| 操作 | 语法 | 说明 |
|---|
UNION | 查询1 UNION 查询2 | 合并结果集(去重) |
UNION ALL | 查询1 UNION ALL 查询2 | 合并结果集(保留重复) |
DISTINCT | RETURN DISTINCT n | 去重 |
OPTIONAL MATCH | OPTIONAL MATCH pattern | 可选匹配(左连接) |
WITH | WITH expr AS alias | 管道传递 |
UNWIND | UNWIND list AS item | 列表展开 |
FOREACH | FOREACH (x IN list | ops) | 循环操作 |
CASE WHEN | CASE WHEN cond THEN expr END | 条件表达式 |
EXISTS {} | WHERE EXISTS { MATCH ... } | 存在性检查 |
5.11 本章小结
| 要点 | 说明 |
|---|
| 路径 | 变长路径 *N..M、最短路径 shortestPath() |
| 聚合 | count, avg, collect, reduce 等 |
| WITH | 管道操作,支持中间过滤和排序 |
| UNWIND | 列表展开为多行 |
| 子查询 | CALL {} 和 EXISTS {} |
| OPTIONAL MATCH | 左连接语义 |
5.12 练习
- 在社交网络图中,使用变长路径找到 Alice 到 Dave 的所有路径(长度不超过 4)。
- 使用
collect() 和 UNWIND 实现"列出每个员工的所有同事"。 - 使用
reduce() 计算一条路径上所有边属性的总和。 - 使用
EXISTS {} 子查询找出所有既是管理者又是高薪员工的人。
5.13 扩展阅读