| From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
|---|---|
| To: | assam258(at)gmail(dot)com |
| Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Ajay Pal <ajay(dot)pal(dot)k(at)gmail(dot)com>, Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: SQL Property Graph Queries (SQL/PGQ) |
| Date: | 2026-01-02 08:48:03 |
| Message-ID: | CAExHW5vDB2WRWcT799jpcGcs2piXp8F1jdKZE+a9EGFx2PbjHg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Henson,
Thanks for your systematic review.
On Wed, Dec 31, 2025 at 11:53 AM Henson Choi <assam258(at)gmail(dot)com> wrote:
>
> Overall assessment: GOOD
Glad to know that.
> - Test Coverage: Good (90.5% line coverage, ~180 test cases)
Thanks for the coverage analysis. Looks good right now. But see below.
> 2.1 Critical / Major
> (None)
Great.
>
> 2.2 Minor Issues
>
> #1 [Code] src/backend/commands/propgraphcmds.c:1632
> FIXME: Missing index for plppropid column causes sequential scan.
> Decision needed: (a) add index, or (c) allow seq scan for rare path.
The path is rare enough that I think we can allow the seq scan. Given
that Peter has marked it as FIXME, it seems we will keep it as is for
now, but will revisit if performance becomes an issue. Peter, please
correct me if I'm wrong.
>
> #2 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:286
> Compatibility section incorrectly states "CREATE PROPERTY GRAPH"
> Should be: "ALTER PROPERTY GRAPH"
>
That's right. Fixed.
> #3 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:65
> Synopsis shows VERTEX/NODE and EDGE/RELATIONSHIP as separate clauses,
> but Description merges them as {VERTEX|NODE|EDGE|RELATIONSHIP}.
Their syntax is slightly different hence they are separate in
Synopsis. If we separate them in the Description, we might have to
repeat the same text twice. Having them merged in the Description
makes it more concise without losing any clarity or meaning. I think
we can keep it as is. What do you think?
>
> #4 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:80
> Grammar error: "the graph removed" should be "the graph is removed"
Right. Fixed.
>
> #5 [Doc] doc/src/sgml/queries.sgml:2929,2931
> TODO comments for unimplemented features without clear limitation notes.
Those TODOs are not visible to users. I think they serve as
placeholders to add the documentation when the features are
implemented. We may want to remove them, but I see no harm in keeping
them for now. Peter, what do you think?
>
> #6 [Doc] doc/src/sgml/ddl.sgml:5717
> Typo: "infrastucture" should be "infrastructure"
>
Fixed. Thanks for catching it.
> 2.3 Alternative Approaches for Discussion
>
> #1 Support CREATE PROPERTY GRAPH IF NOT EXISTS
> Rationale: PostgreSQL-style extension, consistent with other DDL.
>
I think this will be good-to-have for consistency across DDLs.
However, I think it is better to discuss and implement it separately
since lack of it does not block the main functionality of property
graphs. Meantime users can DROP and CREATE. There are other DDLs which
do not have this support. What do you think?
> #2 Return 0 rows for non-existent labels instead of error
> Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:NonExistent) ...) raises error
> Alternative: Return empty result set instead
> Rationale: Better for exploratory queries, similar to SQL WHERE on
> non-existent values returning 0 rows rather than error.
I think this makes sense in a proper graph database where the labels
are not predefined. However, in property graphs the labels, the
properties associated with them and data types of those properties are
predefined in the graph schema. If the specified label does not exist
in the graph schema, we can not determine what properties are
associated with it and their data types. In turn we can not determine
the shape of the result set, which is essential for reporting even 0
rows. Hence the error instead of 0 rows. Oracle has similar behaviour.
>
> #3 Return 0 rows when same variable has different labels
> Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:Person)-[e]-(a:Company) ...)
> raises error because variable 'a' has conflicting labels.
> Alternative: Return empty result set instead
> Rationale: Logically, a node cannot be both Person AND Company,
> so 0 rows is the correct result. Consistent with standard SQL
> WHERE semantics (impossible condition = 0 rows, not error).
>
In such a case 0 rows isn't a correct result. According to the
standard, variable a is bound to all the nodes that have both the
labels. This is label conjunction, a feature we don't support right
now. Hence the "unsupported feature" error. I expect that some day we
will implement label conjunction, and then the same query will return
non-zero rows if there are nodes with both labels.
> -------------------------------------------------------------------------------
> TEMP graph CREATE TEMP PROPERTY GRAPH Medium
Done
> TEMP graph TEMP graph with permanent table reference Medium
Done
> TEMP graph ALTER ADD permanent table to TEMP graph Medium
Done
> CASCADE DROP ... CASCADE (dependent view cascade) Low
Done
> RESTRICT DROP ... RESTRICT (dependent object error) Low
Done, but without explicit RESTRICT keyword since it's default behavior.
- Tests specifying NODE/RELATIONSHIP instead of VERTEX/EDGE (Low priority)
Done
> propgraphcmds.c:136,179,254-262 TEMP table in graph creation Auto TEMP conversion
Done
> propgraphcmds.c:1323,1364 ALTER ADD TEMP to perm graph Error branch
Done
> propgraphcmds.c:724-734 CREATE without LABEL clause Default label else
This should be covered. There are several CREATE PROPERTY GRAPH
statements without LABEL clauses.
> propgraphcmds.c:1300,1303 ALTER IF EXISTS non-existent missing_ok branch
This is unreachable since the syntax is not supported. The only IF
EXISTS variant supported is ALTER PROPERTY GRAPH IF EXISTS ... SET
SCHEMA .... Added a test for the same.
> propgraphcmds.c:116 CREATE UNLOGGED attempt UNLOGGED error
Done
> execMain.c:1162-1163 DML on graph RELKIND_PROPGRAPH
This is unreachable since a property graph reference is only allowed
in GRAPH_TABLE().
> rewriteGraphTable.c:120 Multiple path patterns Length check
Done
> rewriteGraphTable.c:205 Quantifier usage Quantifier error
Done
> ruleutils.c:7939-7963 Complex label VIEW deparsing T_BoolExpr branch
>
Done
Can you please check whether the uncovered lines are now covered?
> - Tests executed: run_pgq_tests.sql
What's this test? It's not in the patchset. Also you might want to try
002_pg_upgrade.
>
> +----------------------------------------------------------+
> | SQL/PGQ Implementation: NO MEMORY LEAKS DETECTED |
> +----------------------------------------------------------+
Great. Thanks for confirming.
>>
>> On Fri, Dec 26, 2025 at 6:03 PM Henson Choi <assam258(at)gmail(dot)com> wrote:
>> >
>> > 1. LABELS() function
>> > - Returns text[] of element labels
>> > - Fixed privilege checking from previous version
>> > - Enables optimizer pushdown for branch pruning
>> >
>> > 2. PROPERTY_NAMES() function
>> > - Returns text[] of property names
>> > - Similar approach to LABELS()
>> >
>>
>> I could not find specification of these functions in SQL/PGQ standard.
>> It's a large document and I might be missing something. Can you please
>> point me to the relevant sections?
>>
>
> You're correct - LABELS() and PROPERTY_NAMES() are not in the SQL/PGQ
> standard. I was inspired by similar functions in Cypher (labels(), keys())
> and Oracle's PGQL (which has LABELS(), though Oracle is moving toward
> SQL/PGQ compliance as well).
>
> The attached code demonstrates that through query rewrite, these functions
> can enable planner-level table pruning optimizations. This becomes
> particularly valuable for client applications - GUI tools could use label
> names as host variables for flexible label-based filtering, and
> PROPERTY_NAMES() (similar to Cypher's keys()) would enable dynamic property
> inspection for display or selective querying of elements with specific
> properties.
>
> While there's a distinction between structured (SQL/PGQ) and semi-structured
> (Cypher) query models, I don't think we need to strictly exclude
> semi-structured patterns that work well within the structured framework.
> Whether this should be proposed as a future SQL/PGQ standard extension or
> remain a PostgreSQL-specific extension is worth discussion. Given the
> practical utility demonstrated in graph query deployments, there might be
> value in standardizing such introspection functions.
While these functions are useful with semistructured frameworks like
Cypher, I am not sure how useful they are in structured framework like
SQL/PGQ. In SQL/PGQ, the graph schema is predefined and known to the
users. Hence users know what labels and properties exist in the graph.
However, if they become part of the standard, their chances of getting
accepted in the PostgreSQL core will increase.
Even if we keep them out of core, I think we should be able to
implement them as stable functions in an extension and still benefit
from planner optimizations you mentioned.
>
> Similarly, I'd suggest considering Cypher's SHORTEST PATH for future
> SQL/PGQ standards. If we treat SHORTEST PATH as a specialized join type
> in SQL and handle it through query rewrite, PostgreSQL's existing planner
> infrastructure could support it efficiently. This approach has been proven
> in practice and could be a valuable addition to the next standard revision.
IIRC there is SQL/PGQ standard specification for shortest path. We
should implement that when we get to it.
>
> That was my finding as well - the architecture is well-designed for
> extensibility.
Great. Thanks for the confirmation.
>
> Having more people who can maintain and extend SQL/PGQ reduces the
> risk for both the community and PostgreSQL itself. When I have a revised
> version ready, would you be willing to review it again?
>
We will need to prioritize the features to be implemented next. This
looks like a useful pattern to support. I believe this will appear
higher in the priority list, thus worth reviewing.
The patches are thus
0001 - same as previous except a. addition of new lines in
create_property_graph.sql for consistency with other sections in that
file, b. some minor corrections suggested by Hensen.
0002 - fixes : references and improves documentation. I think we can
merge this into 0001 after a quick review from Peter
0003 - ECPG test. This might need a bit of review. The patch is large
because of the .c and .err files.
0004 - This reverts back ECPG and PSQL lexer changes introduced in
0001. Does not show any failure even in the test case added by 0003.
Those changes seem unnecessary. Peter, can you please confirm.
0005 - Additional test cases for code coverage as pointed out by
Hensen's report.
--
Best Wishes,
Ashutosh Bapat
| Attachment | Content-Type | Size |
|---|---|---|
| v20260102-0001-WIP-SQL-Property-Graph-Queries-SQL-PGQ.patch | text/x-patch | 663.0 KB |
| v20260102-0005-More-test-coverage.patch | text/x-patch | 15.2 KB |
| v20260102-0004-Possibly-unnecessary-changes.patch | text/x-patch | 1.5 KB |
| v20260102-0003-Test-SQL-PGQ-with-ECPG.patch | text/x-patch | 27.2 KB |
| v20260102-0002-Fix-some-more-references-and-edit-document.patch | text/x-patch | 6.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Xuneng Zhou | 2026-01-02 09:17:34 | Re: Implement waiting for wal lsn replay: reloaded |
| Previous Message | jian he | 2026-01-02 08:05:17 | Re: Non-text mode for pg_dumpall |