Re: SQL Property Graph Queries (SQL/PGQ)

From: Hannu Krosing <hannuk(at)google(dot)com>
To: assam258(at)gmail(dot)com
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Ajay Pal <ajay(dot)pal(dot)k(at)gmail(dot)com>, Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SQL Property Graph Queries (SQL/PGQ)
Date: 2026-02-23 15:46:32
Message-ID: CAMT0RQSd7PyceQ-6krBCoXse=TJeuTkTTb1ZbWYYy=_yOnYWiQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Has there been any progress on this?

Shortest path queries seem to be something that would greatly benefit
from having UNION DISTINCT ON ()

---
Hannu

On Mon, Jan 5, 2026 at 2:58 PM Henson Choi <assam258(at)gmail(dot)com> wrote:
>
> Hi hackers,
>
> I think it may be a bit early for this, but I wanted to share this design
> document now so we can revisit and discuss when the time is right.
>
> =====================================================
> Shortestpath Implementation Strategies for PostgreSQL
> =====================================================
>
> 1. Overview
> -----------
>
> This document describes two implementation strategies for shortest path
> queries in PostgreSQL-based graph databases.
>
> Shortestpath Query Example:
>
> MATCH p = shortestpath((a)-[*]->(b)) RETURN p
> MATCH p = allshortestpaths((a)-[:knows*]->(b)) RETURN p
>
> Two implementation approaches:
>
> 1. Unoptimized: Query rewriting to WITH RECURSIVE (CTE)
> 2. Optimized: Custom executor node with bidirectional BFS
>
>
> 2. Unoptimized Approach: WITH RECURSIVE Rewriting
> -------------------------------------------------
>
> 2.1 Query Transformation
>
> Cypher query:
>
> MATCH p = shortestpath((a:person)-[:knows*]->(b:person))
> WHERE a.name = 'Alice' AND b.name = 'Bob'
> RETURN p
>
> Rewritten SQL using WITH RECURSIVE:
>
> WITH RECURSIVE sp_paths AS (
> -- Base case: start vertex
> SELECT v.id as start_vid,
> v.id as end_vid,
> ARRAY[v.id] as vertex_ids,
> ARRAY[]::bigint[] as edge_ids,
> 0 as depth
> FROM person v
> WHERE name = 'Alice'
>
> UNION ALL
>
> -- Recursive case: expand one hop
> SELECT p.start_vid,
> e.end_id,
> p.vertex_ids || e.end_id,
> p.edge_ids || e.id,
> p.depth + 1
> FROM sp_paths p
> JOIN knows e ON p.end_vid = e.start_id
> WHERE e.end_id <> ALL(p.vertex_ids) -- Cycle prevention
> )
> SELECT vertex_ids, edge_ids, depth
> FROM sp_paths
> WHERE end_vid = (SELECT id FROM person WHERE name = 'Bob')
> ORDER BY depth
> LIMIT 1;
>
> 2.2 Execution Plan
>
> Limit
> -> Sort
> Sort Key: depth
> -> CTE Scan on sp_paths
> Filter: (end_vid = $bob_id)
> CTE sp_paths
> Recursive Union
> Seq Scan on person v
> Filter: (name = 'Alice')
> Hash Join
> Hash Cond: (p.end_vid = e.start_id)
> Filter: (e.end_id <> ALL(p.vertex_ids))
> WorkTable Scan on sp_paths p
> Hash
> Seq Scan on knows e
>
> 2.3 Problems
>
> ┌─────────────────────┬──────────────────────────────────────────┐
> │ Problem │ Description │
> ├─────────────────────┼──────────────────────────────────────────┤
> │ Unidirectional │ Search expands only from start vertex │
> │ │ Misses optimization from end vertex │
> ├─────────────────────┼──────────────────────────────────────────┤
> │ Full Exploration │ Must explore all paths up to min depth │
> │ │ before finding shortest │
> ├─────────────────────┼──────────────────────────────────────────┤
> │ No Early Termination│ Cannot stop when shortest found │
> │ │ CTE runs to completion │
> ├─────────────────────┼──────────────────────────────────────────┤
> │ Memory Usage │ O(V) visited vertices per iteration │
> │ │ All partial paths stored │
> ├─────────────────────┼──────────────────────────────────────────┤
> │ Cycle Prevention │ <> ALL(array) grows with path length │
> │ │ O(path_length) per edge check │
> └─────────────────────┴──────────────────────────────────────────┘
>
> 2.4 Search Space Comparison
>
> Unidirectional BFS (CTE):
>
> Start ──────────────────────────────────────> End
> ╲ ╱
> ╲ Explores entire cone ╱
> ╲ from start ╱
> ╲ ╱
> ╲ Search space: O(B^D) ╱
> ╲ ╱
> ────────────────────────
>
> Bidirectional BFS (Optimized):
>
> Start ────────> <──────── End
> ╲ ╱╲ ╱
> ╲ ╱ ╲ ╱
> ╲ ╱ ╲ ╱
> ╲ ╱ Meet ╲ ╱
> ╲╱ Point ╲
> Search space: O(2 * B^(D/2))
>
>
> 3. Optimized Approach: Custom Executor Node
> -------------------------------------------
>
> 3.1 Design Principle
>
> Instead of query rewriting, implement a dedicated executor node that:
>
> - Performs bidirectional BFS from both start and end vertices
> - Uses hash tables to track visited vertices from each direction
> - Detects meeting point where both searches intersect
> - Terminates early when shortest path is found
>
> 3.2 Plan Node Structure
>
> ┌────────────────────────────────────────────────────────────┐
> │ ShortestpathPlan │
> │ - Bidirectional BFS with dual hash tables │
> │ - Early termination on path found │
> └────────────────────────────────────────────────────────────┘
> │ │
> left │ │ right
> ▼ ▼
> ┌───────────────────────────┐ ┌───────────────────────────┐
> │ Hash2Side (Left) │ │ Hash2Side (Right) │
> │ - KeyTable + HashTable │ │ - KeyTable + HashTable │
> │ - Forward direction │ │ - Reverse direction │
> └───────────────────────────┘ └───────────────────────────┘
> │ │
> ▼ ▼
> ┌───────────────────────────┐ ┌───────────────────────────┐
> │ SubPlan (Forward) │ │ SubPlan (Reverse) │
> │ Index Cond: │ │ Index Cond: │
> │ start_id = $current │ │ end_id = $current │
> └───────────────────────────┘ └───────────────────────────┘
>
> Execution Flow:
>
> Initialize:
> Left HashTable = {start_vertex}
> Right HashTable = {end_vertex}
> │
> ▼
> Expansion loop (smaller side first):
> │
> ├──→ Compare |Left frontier| vs |Right frontier|
> │ │
> │ └──→ Expand smaller side
> │ │
> │ └──→ Check intersection with other side
> │
> ├──→ Repeat until intersection found
> │
> └──→ Return shortest path(s)
>
>
> 3.3 Key Concept: Dual Hash Tables
>
> The executor maintains two hash tables for bidirectional search:
>
> ┌─────────────────────────────────────────────────────────────┐
> │ Left Direction (Start → End) │
> │ ┌─────────────────────────────────────────────────────┐ │
> │ │ KeyTable: Previous hop vertices (expansion base) │ │
> │ │ HashTable: Current hop results (newly discovered) │ │
> │ └─────────────────────────────────────────────────────┘ │
> ├─────────────────────────────────────────────────────────────┤
> │ Right Direction (End → Start) │
> │ ┌─────────────────────────────────────────────────────┐ │
> │ │ KeyTable: Previous hop vertices (expansion base) │ │
> │ │ HashTable: Current hop results (newly discovered) │ │
> │ └─────────────────────────────────────────────────────┘ │
> └─────────────────────────────────────────────────────────────┘
>
> After each hop:
> - Swap KeyTable and HashTable (new base for next hop)
> - Check for intersection between Left and Right HashTables
> - If intersection found, reconstruct path through meeting point
>
> 3.4 Key Concept: Smaller Side Expansion
>
> At each hop, expand from the direction with fewer frontier vertices.
> This minimizes the total number of edges scanned.
>
> Example (highly unbalanced tree):
>
> Graph structure:
> - Forward (Left->Right): branching factor = 100
> - Backward (Right->Left): branching factor = 2
> - Path distance: 6 hops
>
> Simple alternating:
>
> Hop 1: expand Left (1) -> Left=100, Right=1
> Hop 2: expand Right (1) -> Left=100, Right=2
> Hop 3: expand Left (100) -> Left=10,000, Right=2
> Hop 4: expand Right (2) -> Left=10,000, Right=4
> Hop 5: expand Left (10,000) -> Left=1,000,000, Right=4
> Hop 6: expand Right (4) -> meet!
>
> Total vertices expanded: 1 + 1 + 100 + 2 + 10,000 + 4 = 10,108
>
> With smaller side expansion:
>
> Hop 1: Left=1, Right=1 -> expand Left -> Left=100, Right=1
> Hop 2: 1 < 100 -> expand Right -> Left=100, Right=2
> Hop 3: 2 < 100 -> expand Right -> Left=100, Right=4
> Hop 4: 4 < 100 -> expand Right -> Left=100, Right=8
> Hop 5: 8 < 100 -> expand Right -> Left=100, Right=16
> Hop 6: 16 < 100 -> expand Right -> meet!
>
> Total vertices expanded: 1 + 1 + 2 + 4 + 8 + 16 = 32
>
> Speedup: 10,108 / 32 = ~300x fewer vertices explored
>
>
> 3.5 Key Concept: Cycle Detection via Hash Lookup
>
> Hash table insertion automatically prevents revisiting vertices.
>
> Two tuple types in HashTable:
>
> ┌─────────────────────────────────────────────────────────────┐
> │ Long Tuple (path data) │
> │ ┌─────────┬─────────┬─────────────────────────────────┐ │
> │ │ Header │ Graphid │ Edge Rowids (path to vertex) │ │
> │ └─────────┴─────────┴─────────────────────────────────┘ │
> ├─────────────────────────────────────────────────────────────┤
> │ Short Tuple (visited marker) │
> │ ┌─────────┬─────────┐ │
> │ │ Header │ Graphid │ <- No path data, just marker │
> │ └─────────┴─────────┘ │
> └─────────────────────────────────────────────────────────────┘
>
> Insertion logic:
>
> 1. Inserting long tuple (with path):
> - Check if short tuple (marker) exists for same Graphid
> - If marker exists: reject insertion (already visited)
>
> 2. Inserting short tuple (marker):
> - Insert marker first
> - Remove all existing long tuples with same Graphid
> - Vertex is now "locked" - no more paths to this vertex
>
> Example:
>
> Left Hop 1: Expand from A
> HashTable = {B(long), C(long)} <- paths A->B, A->C
>
> Left and Right meet at C -> shortest path found
>
> Insert C(short) marker:
> HashTable = {B(long), C(short)} <- C's long tuple removed
>
> Later, another path tries to reach C (e.g., D->C):
> -> C(short) marker exists -> insertion rejected
> -> Already processed at shortest hop
>
> Benefits:
> - O(1) duplicate detection via hash lookup
> - No separate "visited" set needed
> - Automatic cleanup of superseded paths
> - Memory efficient (markers are minimal size)
>
>
> 3.6 Key Concept: Batch Synchronization
>
> When hash table exceeds memory, it splits into multiple batches.
> Left and Right hash tables must have matching batch structure:
>
> Problem without synchronization:
>
> Left: nbatch=1 [all data in batch 0]
> Right: nbatch=4 [batch0] [batch1] [batch2] [batch3]
>
> -> Cannot find intersection (different batch structures)
>
> Solution:
>
> When Right splits into 4 batches, clone structure to Left:
>
> Left: nbatch=4 [batch0] [batch1] [batch2] [batch3]
> Right: nbatch=4 [batch0] [batch1] [batch2] [batch3]
> | | | |
> Join only matching batches
>
> This ensures correct intersection detection across batch boundaries.
>
>
> 3.7 Execution Plan (Optimized)
>
> EXPLAIN MATCH p = shortestpath((a:person)-[:knows*]->(b:person))
> WHERE a.name='Alice' AND b.name='Bob' RETURN p
>
> Shortestpath (limit=1)
> -> Hash2Side (left direction=FORWARD)
> -> Index Scan on knows e
> Index Cond: (start_id = $current_vertex)
> -> Hash2Side (right direction=REVERSE)
> -> Index Scan on knows e
> Index Cond: (end_id = $current_vertex)
>
>
> 4. Comparison Summary
> ---------------------
>
> ┌─────────────────────┬─────────────────────────┬─────────────────────────┐
> │ Aspect │ Unoptimized (CTE) │ Optimized (Executor) │
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
> │ Implementation │ Query rewriting │ Custom executor node │
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
> │ Search Direction │ Unidirectional │ Bidirectional │
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
> │ Search Space │ O(B^D) │ O(2 * B^(D/2)) │
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
> │ Early Termination │ Not possible │ Stop when path found │
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
> │ Data Structure │ WorkTable (tuplestore) │ Dual hash tables │
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
> │ Cycle Prevention │ Array membership check │ Hash table lookup O(1) │
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
> │ Code Complexity │ Low (rewriting only) │ High (new executor) │
> ├─────────────────────┼─────────────────────────┼─────────────────────────┤
> │ PostgreSQL Changes │ None │ New plan/executor node │
> └─────────────────────┴─────────────────────────┴─────────────────────────┘
>
>
> 5. Search Space Example
> -----------------------
>
> Graph: Social network, avg 100 connections per person
> Query: shortestpath((Alice)-[*]->(Bob)), actual distance = 6 hops
>
> Unidirectional BFS (CTE):
> Hop 1: 100 vertices
> Hop 2: 10,000 vertices
> Hop 3: 1,000,000 vertices
> Hop 4: 100,000,000 vertices
> Hop 5: 10,000,000,000 vertices
> Hop 6: Find Bob
>
> Total vertices explored: ~10 billion
>
> Bidirectional BFS with smaller side expansion:
> Hop 1: Left=1, Right=1 -> expand Left -> Left=100
> Hop 2: 1 < 100 -> expand Right -> Right=100
> Hop 3: 100 = 100 -> expand Left -> Left=10,000
> Hop 4: 100 < 10,000 -> expand Right -> Right=10,000
> Hop 5: 10,000 = 10,000 -> expand Left -> Left=1,000,000
> Hop 6: 10,000 < 1,000,000-> expand Right -> Meet!
>
> Total vertices explored: ~1 million (each side explores ~500K)
>
> Speedup: ~10,000x fewer vertices explored
>
>
> 6. Key Techniques Summary
> -------------------------
>
> ┌────┬───────────────────────────┬────────────────────────────────────────┐
> │ # │ Technique │ Benefit │
> ├────┼───────────────────────────┼────────────────────────────────────────┤
> │ 1 │ Bidirectional BFS │ Exponential reduction in search space │
> │ 2 │ Dual Hash Tables │ O(1) visited check, O(1) intersection │
> │ 3 │ Smaller Side Expansion │ Expand from side with fewer vertices │
> │ 4 │ Early Termination │ Stop immediately when path found │
> │ 5 │ Cycle Detection │ Hash lookup prevents revisiting vertex │
> │ 6 │ Batch Synchronization │ Match nbatch between Left/Right hashes │
> │ 7 │ Path Reconstruction │ Merge left and right paths at meeting │
> │ 8 │ Parameterized Subplan │ Reuse edge scan plan per vertex │
> │ 9 │ Batch Processing │ Handle memory overflow with batching │
> │ 10 │ Limit Control │ shortestpath (1) vs allshortestpaths │
> └────┴───────────────────────────┴────────────────────────────────────────┘
>
>
> 7. Conclusion
> -------------
>
> The unoptimized CTE-based approach is simpler to implement but suffers from:
>
> - Unidirectional search with O(B^D) complexity
> - No early termination capability
> - Inefficient cycle prevention with arrays
> - Must explore all paths before finding shortest
>
> The optimized executor-based approach requires more implementation
> effort but provides:
>
> - Bidirectional BFS with O(B^(D/2)) complexity
> - Early termination when shortest path found
> - Efficient hash-based visited tracking
> - Support for both shortestpath and allshortestpaths
>
> For production graph database systems, the bidirectional BFS approach
> is essential for practical shortest path queries on large graphs.
>
> This document focuses on the conceptual design. Implementation details
> such as hash table sizing, batch overflow handling, and path reconstruction
> algorithms are left for follow-up discussion.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2026-02-23 15:50:57 Re: NOT NULL NOT ENFORCED
Previous Message Nathan Bossart 2026-02-23 15:45:50 Re: Moving the vacuum GUCs' docs out of the Client Connection Defaults section