[SQL/PGQ] Early pruning for GRAPH_TABLE path generation

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, assam258 <assam258(at)gmail(dot)com>
Subject: [SQL/PGQ] Early pruning for GRAPH_TABLE path generation
Date: 2026-05-09 07:25:03
Message-ID: CAEG8a3+4tA=e2M_f=dfaMgur+HJQ4R11v0torj2nc_k8XX3U6A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

In [1], Satya proposed a PoC to limit the total number of GRAPH_TABLE
path combinations to prevent memory exhaustion. While I agree that
path explosion is a real concern, I do not think we should impose such
a limit before generate_queries_for_path_pattern_recurse(). In many
cases, most candidate paths can be eliminated later by
generate_query_for_graph_path().

The current implementation has a different problem:
generate_queries_for_path_pattern_recurse() first enumerates complete
path combinations, and only later does generate_query_for_graph_path()
check whether the selected edge elements actually connect the adjacent
vertex elements. We can improve this by pruning infeasible path
prefixes early during DFS.

This patch adds an early feasibility check during DFS path
construction. After appending a new path_element to the current
prefix, we verify whether the prefix can still satisfy edge-vertex
adjacency constraints:
- if the new element is an edge, check it against any
already-selected elements in the current prefix
- if the new element is a vertex, check only the immediately preceding edge

The second case is sufficient because repeated vertex variables are
merged into a single path_factor before DFS begins. So appending a
vertex only adds a new constraint to the edge directly before it in
the factorized path. Any later edge referencing the same factor will
be checked when that edge itself is appended.

This does not change query generation semantics. It only avoids
exploring full-length paths that would later be rejected by
generate_query_for_graph_path().

Using the property graph from [1]

CREATE temp TABLE v1 (id int PRIMARY KEY, val int);
CREATE temp TABLE v2 (id int PRIMARY KEY, val int);
CREATE temp TABLE v3 (id int PRIMARY KEY, val int);
CREATE temp TABLE v4 (id int PRIMARY KEY, val int);
CREATE temp TABLE v5 (id int PRIMARY KEY, val int);
CREATE temp TABLE v6 (id int PRIMARY KEY, val int);
CREATE temp TABLE v7 (id int PRIMARY KEY, val int);
CREATE temp TABLE v8 (id int PRIMARY KEY, val int);
CREATE temp TABLE e1 (id int PRIMARY KEY, src int, dest int);
CREATE temp TABLE e2 (id int PRIMARY KEY, src int, dest int);
CREATE temp TABLE e3 (id int PRIMARY KEY, src int, dest int);
CREATE temp TABLE e4 (id int PRIMARY KEY, src int, dest int);
CREATE temp TABLE e5 (id int PRIMARY KEY, src int, dest int);
CREATE temp TABLE e6 (id int PRIMARY KEY, src int, dest int);
CREATE temp TABLE e7 (id int PRIMARY KEY, src int, dest int);
CREATE temp TABLE e8 (id int PRIMARY KEY, src int, dest int);
CREATE PROPERTY GRAPH g5
VERTEX TABLES (
v1 LABEL vl PROPERTIES (id, val) LABEL vl2 PROPERTIES (id, val),
v2 LABEL vl PROPERTIES (id, val),
v3 LABEL vl PROPERTIES (id, val),
v4 LABEL vl PROPERTIES (id, val),
v5 LABEL vl PROPERTIES (id, val),
v6 LABEL vl PROPERTIES (id, val),
v7 LABEL vl PROPERTIES (id, val),
v8 LABEL vl PROPERTIES (id, val)
)
EDGE TABLES (
e1 SOURCE KEY (src) REFERENCES v1 (id) DESTINATION KEY (dest)
REFERENCES v2 (id) LABEL el PROPERTIES (id),
e2 SOURCE KEY (src) REFERENCES v2 (id) DESTINATION KEY (dest)
REFERENCES v3 (id) LABEL el PROPERTIES (id),
e3 SOURCE KEY (src) REFERENCES v3 (id) DESTINATION KEY (dest)
REFERENCES v4 (id) LABEL el PROPERTIES (id),
e4 SOURCE KEY (src) REFERENCES v4 (id) DESTINATION KEY (dest)
REFERENCES v5 (id) LABEL el PROPERTIES (id),
e5 SOURCE KEY (src) REFERENCES v5 (id) DESTINATION KEY (dest)
REFERENCES v6 (id) LABEL el PROPERTIES (id),
e6 SOURCE KEY (src) REFERENCES v6 (id) DESTINATION KEY (dest)
REFERENCES v7 (id) LABEL el PROPERTIES (id),
e7 SOURCE KEY (src) REFERENCES v7 (id) DESTINATION KEY (dest)
REFERENCES v8 (id) LABEL el PROPERTIES (id),
e8 SOURCE KEY (src) REFERENCES v8 (id) DESTINATION KEY (dest)
REFERENCES v1 (id) LABEL el PROPERTIES (id)
);

SELECT count(*) FROM GRAPH_TABLE (g5 MATCH (a IS
vl2)-[e1]->(b)-[e2]->(c)-[e3]->(d)-[e4]->(e) COLUMNS (a.id AS aid));

head:

# SELECT count(*) FROM GRAPH_TABLE (g5 MATCH (a IS
vl2)-[e1]->(b)-[e2]->(c)-[e3]->(d)-[e4]->(e) COLUMNS (a.id AS aid));
count
-------
0
(1 row)

Time: 5388.629 ms (00:05.389)

# SELECT count(*) FROM GRAPH_TABLE (g5 MATCH (a IS
vl2)-[e1]->(b)-[e2]->(c)-[e3]->(d)-[e4]->(e) COLUMNS (a.id AS aid));
count
-------
0
(1 row)

Time: 5195.827 ms (00:05.196)

patched:

[local] zjw(at)postgres:5432-5924=# SELECT count(*) FROM GRAPH_TABLE (g5
MATCH (a IS vl2)-[e1]->(b)-[e2]->(c)-[e3]->(d)-[e4]->(e) COLUMNS (a.id
AS aid));
count
-------
0
(1 row)

Time: 11.827 ms

[local] zjw(at)postgres:5432-5924=# SELECT count(*) FROM GRAPH_TABLE (g5
MATCH (a IS vl2)-[e1]->(b)-[e2]->(c)-[e3]->(d)-[e4]->(e) COLUMNS (a.id
AS aid));
count
-------
0
(1 row)

Time: 2.924 ms

As you can see, the results are quite promising.

Unfortunately, I cannot attach the patch to this email because of
company security policy restrictions (I will do so once I get my own
Mac). If you are interested, please check [2] on GitHub.

[1] https://www.postgresql.org/message-id/flat/CAHg%2BQDfDwcM4%3DDSiAV6Ly89YQ5EcMhzO1-9x%3DmGG1WJzODcAig%40mail.gmail.com
[2] https://github.com/zhjwpku/postgres/commit/bdc1c7ea9729285f105e650c0f5bd032802a85d8

--
Regards
Junwang Zhao

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Henson Choi 2026-05-09 07:43:18 Re: Row pattern recognition
Previous Message Chao Li 2026-05-09 06:38:03 Re: FOR PORTION OF does not recompute GENERATED STORED columns that depend on the range column