From: | Junwang Zhao <zhjwpku(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | Amit Langote <amitlangote09(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Ajay Pal <ajay(dot)pal(dot)k(at)gmail(dot)com>, Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: SQL Property Graph Queries (SQL/PGQ) |
Date: | 2025-08-31 11:05:19 |
Message-ID: | CAEG8a3JZUxcyoNDnMBDus1AFwuXmofK52eGZTgrjo6RRVcbNNg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Ashutosh,
Thanks for the new patch set.
On Thu, Aug 7, 2025 at 6:47 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> Hi Amit,
> Thanks for your comments
>
> On Wed, Jul 23, 2025 at 2:08 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> >
> > Hi Ashutosh,
> >
> > On Tue, Jul 22, 2025 at 8:40 PM Ashutosh Bapat
> > <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > > Here's the patchset rebased on the latest master.
> >
> > Thanks for the update. I was going through the patch series today and
> > had a few comments on the structure that might help with reviewing and
> > eventually committing, whoever ends up doing that. :)
> >
> > I noticed that some functions, like graph_table_property_reference(),
> > are introduced in one patch and then renamed or removed in a later
> > one. It’s easier to review if such refactoring is avoided across
> > patches -- ideally, each patch should introduce code in its final
> > intended form. That also helps reduce the number of patches and makes
> > each one more self-contained.
> >
> > One of the later patches (0004) collects several follow-up changes.
> > While it fixes a real bug -- a crash when GRAPH_TABLE is used inside
> > plpgsql due to a conflict with the columnref hook -- it also includes
> > incidental cleanups like switching to foreach_node(), updating
> > comments, and adjusting function signatures. Those would be better
> > folded into the patches that introduced the relevant code, rather than
> > grouped into a catch-all at the end. That keeps each patch focused and
> > easier to review -- and easier to merge when committing.
> >
> > A general principle that might help: if you're refactoring existing
> > code, a standalone preliminary patch makes sense. But if you're
> > tweaking something just added in the same series, it’s usually better
> > to squash that into the original patch. The rename from
> > graph_table_property_reference() to transformGraphTablePropertyRef()
> > may be a fair exception since it reflects a shift prompted by the bug
> > fix -- but most other adjustments could be folded in without loss of
> > clarity.
> >
> > I understand the intent to spell out each change, but if the details
> > are incidental to the overall design, they don’t necessarily need to
> > be split out. Explaining the reasoning in the thread is always
> > helpful, but consolidating the patches once the design has settled
> > makes things easier for both reviewers and committers.
>
> 0001 is almost the same patch that Peter posted almost a year ago.
> Each of 0002 onwards patches contains logically grouped changes. I
> have changed 0001 minimally (so that it continues to apply, compile
> and pass regression). I think this arrangement will make it easy for
> Peter to review the changes. It will help him understand what all
> changed since 0001 and each change in its own context. That has led to
> a lot of overlap between 0002 and 0001 which I think should be
> squashed together even now. But I would like to know what Peter thinks
> - what would make his review easier. This arrangement might also help
> Junwang, who has reviewed some patches, to continue his incremental
> review.
>
> I have rearranged the patches 0002 to 0014 so that the same change
> isn't revised in multiple patches. Hope that helps.
>
> I am also attaching a single patch as well to this email, so that
> newer reviewers can review the changes as a whole.
>
> SQL_PGQ_20250807.zip is separate patches zipped together.
> SQL_PGQ_one_patch_20250807.zip is a single diff with changes from all
> the patches squashed together. It's still zipped to avoid the email
> being held for moderation.
>
> >
> > Finally, attaching the patches directly, with versioned names like
> > v8-000x, instead of zipping them helps. Many folks (myself included)
> > will casually skip a zip file because of the small hassle of unzipping
> > just to read a patch. I once postponed reading such a patch and didn’t
> > get back to it for quite a while. :)
>
> I used to attach those as separate patches till [1], but after that
> the total size of the patchset grew so much that my emails used to get
> held back for moderation. So I started attaching zip files. The
> increase in size is mostly due to the 0014 patch. If we think that
> it's not needed or can be trimmed down, we will be able to attach the
> patchset as separate attachments.
>
> Here's what changed between the last patchset and this
>
> 0001 has some TODOs, FIXMEs and XXXs. I and Junwang had fixed some of
> them earlier. This patchset fixes some more. Below is my analysis of
> each of them.
> +
> + <!-- TODO: multiple path patterns in a graph pattern (comma-separated) -->
> +
> + <!-- TODO: quantifiers -->
>
> I think we should remove these TODOs since we are not going to support
> these cases in this patchset. We will add the documentation when we
> implement these features.
>
> +/*
> + * 15.2
> + * PG_DEFINED_LABEL_SETS view
> + */
> +
> +-- TODO
> +
> +
> +/*
> + * 15.3
> + * PG_DEFINED_LABEL_SET_LABELS view
> + */
> +
> +-- TODO
> +
> +
> +/*
> + * 15.4
> + * PG_EDGE_DEFINED_LABEL_SETS view
> + */
> +
> +-- TODO
> +
> +/*
> + * 15.6
> + * PG_EDGE_TRIPLETS view
> + */
> +
> +-- TODO
> +
> +/*
> + * 15.15
> + * PG_VERTEX_DEFINED_LABEL_SETS view
> + */
> +
> +-- TODO
>
> All these views are related to the defined label set. IIUC, defined
> label set is a set of labels which is shared by multiple edges or
> vertices of a property graph. Please note it's the whole set shared;
> not just individual labels in the set. In that sense, I think, the
> labels associated with each edge or vertex table in a property graph
> form a defined label set. That way the property graph element's OID
> becomes the defined label set's identifier. I am not sure if this view
> has any practical use for tabular property graphs. It's not clear to
> me what these views will contain. Since they are not essential for
> SQL/PGQ functionality, I have not implemented them.
>
> + /*
> + * Now check number and names.
> + *
> + * XXX We could provide more detail in the error messages, but that would
> + * probably only be useful for some ALTER commands, because otherwise it's
> + * not really clear which label definition is the wrong one, and so you'd
> + * have to construct a rather verbose report to be of any use. Let's keep
> + * it simple for now.
> + */
>
> Agree with XXX. I think the code is good enough as is. We can leave
> XXX there in case we expect someone to improve it in future. Otherwise
> we should remove XXX as well.
>
> +
> + rel = table_open(PropgraphLabelPropertyRelationId, RowShareLock);
> + ScanKeyInit(&key[0],
> + Anum_pg_propgraph_label_property_plppropid,
> + BTEqualStrategyNumber, F_OIDEQ,
> + ObjectIdGetDatum(propoid));
> + scan = systable_beginscan(rel, InvalidOid /* FIXME */ ,
> + true, NULL, 1, key);
>
> I think the FIXME is to add the correct index OID.
> pg_propgraph_label_property_label_prop_index contains plppropid as a
> key but it's not the first column. So probably some other, yet not
> defined, index is expected here?
>
> +ALTER PROPERTY GRAPH g3 ALTER VERTEX TABLE t3 DROP LABEL t3l3;
> +ALTER PROPERTY GRAPH g3 DROP VERTEX TABLES (t2); -- fail (TODO:
> dubious error message)
> +ERROR: cannot drop vertex t2 of property graph g3 because other
> objects depend on it
> +DETAIL: edge e1 of property graph g3 depends on vertex t2 of property graph g3
> +HINT: Use DROP ... CASCADE to drop the dependent objects too.
>
> I don't see what's dubious about this error message. edge e1 connects
> t2 and t1. Dropping t2 would leave e1 dangling; error is preventing
> it. Looks similar to how we prevent a referenced table being dropped.
> Or is it expected that e1 be dropped without specifying cascade when
> t2 is dropped since e1 has no existence without t2?
>
> Also the patchset fixes the CI failure from the previous patchset.
>
> The patches are reordered slightly. Each patch contains a commit
> message that explains the changes in that patch.
>
> [1] https://www.postgresql.org/message-id/CAExHW5sM+ZGVzR1428FsDHuWc-FU2-6zQr5j91KLh6vZaWY0ow@mail.gmail.com
>
>
>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
I have some review comments, and hope some of them are helpful.
1.
doc/src/sgml/ddl.sgml
+<programlisting>
+CREATE PROPERTY GRAPH myshop
+ VERTEX TABLES (
+ products LABEL product,
+ customers LABEL customer,
+ orders LABEL order
+ )
+ EDGE TABLES (
+ order_items SOURCE orders DESTINATION products LABEL contains,
+ customer_orders SOURCE customers DESTINATION orders LABEL has
+ );
+</programlisting>
order is a reserved keyword, so the following example will fail, we
should not give a wrong example in our document.
2.
doc/src/sgml/information_schema.sgml
+ <title><literal>pg_element_table_key_columns</literal></title>
+
+ <para>
+ The view <literal>pg_element_key_columns</literal> identifies which columns
+ are part of the keys of the element tables of property graphs defined in
+ the current database. Only those property graphs are shown that the
+ current user has access to (by way of being the owner or having some
+ privilege).
+ </para>
+ <title><literal>pg_element_table_properties</literal></title>
+
+ <para>
+ The view <literal>pg_element_table_labels</literal> shows the definitions
+ of the properties for the element tables of property graphs defined in the
+ current database. Only those property graphs are shown that the current
+ user has access to (by way of being the owner or having some privilege).
+ </para>
the <title> and the <para> doesn't match.
3.
doc/src/sgml/queries.sgml
+ <para>
+ A path pattern thus matches a sequence of vertices and edges. The
+ simplest possible path pattern is
+<programlisting>
+()
+</programlisting>
+ which matches a single vertex. The next simplest pattern would be
+<programlisting>
+()-[]->()
+</programlisting>
+ which matches a vertex followed by an edge followed by a vertex. The
+ characters <literal>()</literal> are a vertex pattern and the characters
+ <literal>-[]-></literal> are an edge pattern.
+ </para>
the description seems wrong, when a user writes (), it should match
all vertices in a property graph, for example:
explain SELECT customer_name FROM GRAPH_TABLE (myshop MATCH () COLUMNS
(1 AS customer_name));
[local] zjw(at)postgres:5432-73666=# explain SELECT customer_name FROM
GRAPH_TABLE (myshop MATCH () COLUMNS (1 AS customer_name));
QUERY PLAN
------------------------------------------------------------------
Append (cost=0.00..89.40 rows=3960 width=4)
-> Seq Scan on customers (cost=0.00..18.50 rows=850 width=4)
-> Seq Scan on orders (cost=0.00..32.60 rows=2260 width=4)
-> Seq Scan on products (cost=0.00..18.50 rows=850 width=4)
(4 rows)
4.
doc/src/sgml/ref/create_property_graph.sgml
+<programlisting>
+CREATE PROPERTY GRAPH g1
+ VERTEX TABLES (
+ v1 LABEL foo PROPERTIES (x AS a * 2) LABEL bar PROPERTIES (x AS a * 2)
+ ) ...
+</programlisting>
+ but this would not:
+<programlisting>
+CREATE PROPERTY GRAPH g1
+ VERTEX TABLES (
+ v1 LABEL foo PROPERTIES (x AS a * 2) LABEL bar PROPERTIES (x AS a * 10)
+ ) ...
+</programlisting></para>
The expr after PROPERTIES should be (a * 2 AS x)
5.
src/backend/catalog/sql_features.txt
+G034 Path concatenation YES SQL/PGQ required
Do we support path concatenation?
+G070 Label expression: label disjunction NO SQL/PGQ required
I think this should be a YES?
6.
src/backend/commands/tablecmds.c
The description RenameRelation and RemoveRelations should adapt
property graph, below are diffs I suggest:
/*
- * Execute ALTER TABLE/INDEX/SEQUENCE/VIEW/MATERIALIZED VIEW/FOREIGN TABLE
+ * Execute ALTER TABLE/INDEX/SEQUENCE/VIEW/MATERIALIZED VIEW/FOREIGN
TABLE/PROPERTY GRAPH
* RENAME
*/
ObjectAddress
@@ -4210,8 +4210,8 @@ RenameRelation(RenameStmt *stmt)
/*
* Grab an exclusive lock on the target table, index, sequence, view,
- * materialized view, or foreign table, which we will NOT release until
- * end of transaction.
+ * materialized view, foreign table or property graph, which we will NOT
+ * release until end of transaction.
/*
* RemoveRelations
* Implements DROP TABLE, DROP INDEX, DROP SEQUENCE, DROP VIEW,
- * DROP MATERIALIZED VIEW, DROP FOREIGN TABLE
+ * DROP MATERIALIZED VIEW, DROP FOREIGN TABLE, DROP PROPERTY GRAPH
7.
src/backend/nodes/nodeFuncs.c
exprLocation should have an arm for T_GraphPropertyRef? I suggest:
@@ -1811,6 +1811,9 @@ exprLocation(const Node *expr)
case T_PartitionRangeDatum:
loc = ((const PartitionRangeDatum *) expr)->location;
break;
+ case T_GraphPropertyRef:
+ loc = ((const GraphPropertyRef *) expr)->location;
+ break;
8.
src/backend/rewrite/rewriteGraphTable.c
patch 0002
+ /*
+ * Label expressions from two element patterns need to be
+ * conjucted. Label conjuction is not supported right now
typo, should be conjuncted and conjunction.
9.
The line length of some code is quite long, which isn't ideal, but
it's not a major issue. It can be formatted before being committed.
BTW, the patch set seems not to apply to the current master.
I will go another round of review of the test cases, thanks.
--
Regards
Junwang Zhao
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2025-08-31 11:34:26 | Re: PG 18 relnotes and RC1 |
Previous Message | Mikael Kjellström | 2025-08-31 08:19:53 | Re: ecpg test thread/alloc hangs on sidewinder running NetBSD 9.3 |