| From: | Henson Choi <assam258(at)gmail(dot)com> |
|---|---|
| To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
| Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Ajay Pal <ajay(dot)pal(dot)k(at)gmail(dot)com>, Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: SQL Property Graph Queries (SQL/PGQ) |
| Date: | 2025-12-28 04:43:54 |
| Message-ID: | CAAAe_zD_9yAJUbJ8Gu4Bz91r3dZmSi7BqYsgQz+sp+6qe+OWvQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
I apologize for the confusion caused by my partial patch submission.
The patch was intended to be applied on top of the PGQ base patch,
but I was not aware that CFBot would try to apply it directly to master.
Next time, I will use a .txt extension so that the original authors
can review it before CFBot processes it.
Sorry for the inconvenience.
Regards,
Henson Choi
2025년 12월 25일 (목) AM 1:23, Henson Choi <assam258(at)gmail(dot)com>님이 작성:
> Subject: Re: SQL Property Graph Queries (SQL/PGQ)
>
> Hi hackers,
>
> Apologies for jumping into the SQL/PGQ discussion mid-stream. I've been
> working on implementing the LABELS() graph element function and ran into
> an issue I'd like to get some guidance on.
>
> == What LABELS() does ==
>
> LABELS(element_variable) returns a text[] array containing all labels
> associated with a graph element. For example:
>
> SELECT * FROM GRAPH_TABLE (myshop
> MATCH (n IS customers)
> COLUMNS (n.name, LABELS(n) AS lbls)
> );
>
> name | lbls
> ------------+-------------
> customer1 | {customers}
> customer2 | {customers}
>
> == Implementation approach ==
>
> A naive approach would return LABELS() as a constant (e.g.,
> ARRAY['customers']
> directly). However, this prevents the optimizer from pushing down
> predicates
> involving LABELS() through UNION ALL branches.
>
> To enable optimizer pushdown, I wrap each element table in a subquery that
> adds a virtual __labels__ column containing a constant array. This way,
> LABELS(n) returns a Var referencing that column, allowing the optimizer to
> evaluate it per-branch and prune accordingly:
>
> -- Conceptually, the rewrite produces:
> SELECT name, __labels__ AS lbls FROM (
> SELECT *, ARRAY['customers']::text[] AS __labels__ FROM customers
> ) ...
>
> This allows the optimizer to perform constant folding and prune UNION ALL
> branches when filtering by label:
>
> -- Filter in outer WHERE clause
> EXPLAIN SELECT * FROM GRAPH_TABLE (myshop
> MATCH (n IS lists) -- lists label shared by orders and wishlists
> COLUMNS (LABELS(n) AS lbls, n.node_id)
> ) WHERE 'orders' = ANY(lbls);
>
> QUERY PLAN
> ------------------------------------------------------
> Seq Scan on orders
> Output: '{orders,lists}'::text[], orders.order_id
>
> -- Filter in MATCH WHERE clause without IS (both tables scanned)
> EXPLAIN SELECT * FROM GRAPH_TABLE (myshop
> MATCH (n) WHERE 'lists' = ANY(LABELS(n))
> COLUMNS (LABELS(n) AS lbls, n.node_id)
> );
>
> QUERY PLAN
> ------------------------------------------------------
> Append
> -> Seq Scan on orders
> Output: '{orders,lists}'::text[], orders.order_id
> -> Seq Scan on wishlists
> Output: '{lists,wishlists}'::text[], wishlists.wishlist_id
>
> The first example prunes the wishlists branch since 'orders' is not in its
> label set. The second example uses LABELS() without an IS clause to filter
> across all element tables that have the 'lists' label.
>
> == The problem ==
>
> The subquery wrapping breaks column-level privilege checking.
>
> Test case from privileges.sql:
>
> -- regress_priv_user4 has SELECT on atest5(one, four) only
> -- lttc label maps atest5.three -> lttck (no privilege)
>
> SET ROLE regress_priv_user4;
> SELECT * FROM graph_table (ptg1 MATCH (v IS lttc) COLUMNS (v.lttck));
> -- Expected: ERROR: permission denied for table atest5
> -- Actual: query succeeds (returns rows without error)
>
> I haven't yet identified the exact root cause of this issue. Has anyone
> encountered a similar issue? Any pointers would be appreciated.
>
> Thanks,
> Henson
>
> 2025년 12월 18일 (목) PM 6:16, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>님이
> 작성:
>
>> On Wed, Dec 17, 2025 at 2:28 PM Peter Eisentraut <peter(at)eisentraut(dot)org>
>> wrote:
>> >
>> > On 17.12.25 06:32, Ashutosh Bapat wrote:
>> > > On Mon, Dec 15, 2025 at 6:43 PM Ashutosh Bapat
>> > > <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>> > >>
>> > >> Rebased patches on the latest HEAD which required me to move
>> > >> graph_table.sql to another parallel group.
>> > >
>> > > Huh, the movement resulted in losing that test from parallel_schedule.
>> > > Fixed in the attached patches.
>> >
>> > A couple of items:
>> >
>> > 1) I was running some tests that involved properties with mismatching
>> > typmods, and I got an error message like
>> >
>> > ERROR: property "p1" data type modifier mismatch: 14 vs. 19
>> >
>> > but the actual types were varchar(10) and varchar(15). So to improve
>> > that, we need to run these through the typmod formatting routines, not
>> > just print the raw typmod numbers. I actually just combined that with
>> > the check for the type itself. Also, there was no test covering this,
>> > so I added one. See attached patches.
>>
>> +1. The error message is better.
>>
>> >
>> > I did another investigation about whether this level of checking is
>> > necessary. I think according to the letter of the SQL standard, the
>> > typmods must indeed match. It seems Oracle does not check (the example
>> > mentioned above came from an Oracle source). But I think it's okay to
>> > keep the check. In PostgreSQL, it is much less common to write like
>> > varchar(1000). And we can always consider relaxing it later.
>>
>> +1.
>>
>> Attached patch adds a couple more test statements.
>>
>> >
>> > 2) I had it in my notes to consider whether we should support the colon
>> > syntax for label expressions. I think we might have talked about that
>> > before.
>> >
>> > I'm now leaning toward not supporting it in the first iteration. I
>> > don't know that we have fully explored possible conflicts with host
>> > variable syntax in ecpg and psql and the like. Maybe avoid that for
>> now.
>> >
>>
>> I was aware of ecpg and I vaguely remember we fixed something in ECPG
>> to allow : in MATCH statement. Probably following changes in
>> psqlscan.l and pgc.l
>> -self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
>> +self [,()\[\].;\:\|\+\-\*\/\%\^\<\>\=]
>>
>> Those changes add | after : (and not the : itself) so maybe they are
>> not about supporting : . Do you remember what those are?
>>
>> However, I see that : in psql can be a problem
>> #create table t1 (a int, b int);
>> #create property graph g1 vertex tables (t1 key (a));
>> #select * from GRAPH_TABLE (g1 MATCH (a :t1) COLUMNS (a.a));
>> a
>> ---
>> (0 rows)
>>
>> #\set t1 blank
>> #select * from GRAPH_TABLE (g1 MATCH (a :t1) COLUMNS (a.a));
>> ERROR: syntax error at or near "blank"
>> LINE 1: select * from GRAPH_TABLE (g1 MATCH (a blank) COLUMNS (a.a))...
>>
>> > There was also a bit of an inconsistency in the presentation: The
>> > documentation introduced the colon as seemingly the preferred syntax,
>> > but ruleutils.c dumped out the IS syntax.
>> >
>> > (It was also a bit curious that some test code put spaces around the
>> > colon, which is not idiomatic.)
>> >
>>
>> That might have been just me trying to type one letter less and yet
>> keeping the query readable. A column between variable name and the
>> label is not always noticeable.
>>
>> > Attached is a patch that shows how to revert the colon support. It's
>> > pretty simple, and it would be easy to add it back in later, I think.
>>
>> I agree that it's better not to support it now. It requires more work
>> and it's optional. When we come to support it, we will need thorough
>> testing.
>>
>> I spotted some examples that use : in ddl.sgml.
>> <programlisting>
>> SELECT customer_name FROM GRAPH_TABLE (myshop MATCH
>> (c:customer)-[:has]->(o:"order" WHERE o.ordered_when = current_date)
>> COLUMNS (c.name AS customer_name));
>> </programlisting>
>>
>> The query demonstrates that one can use label names in a way that will
>> make the pattern look like an English sentence. Replacing : with IS
>> defeats that purpose.
>>
>> As written in that paragraph, the labels serve the purpose of exposing
>> the table with a different logical view (using different label and
>> property names). So we need that paragraph, but I think we should
>> change the example to use IS instead of :. Attached is suggested
>> minimal change, but I am not happy with it. Another possibility is we
>> completely remove that paragraph; I don't think we need to discuss all
>> possible usages the users will come up with.
>>
>> The patch changes one more instance of : by IS. But that's straight
>> forward.
>>
>> In ddl.sgml I noticed a seemingly incomplete sentence
>> A property graph is a way to represent database contents, instead of
>> using
>> relational structures such as tables.
>>
>> Represent the contents as what? I feel the complete sentence should be
>> one of the following
>> property graph is a way to represent database contents as a graph,
>> instead of representing those as relational structures OR
>> property graph is another way to represent database contents instead
>> of using relational structures such as tables
>>
>> But I can't figure out what was originally intended.
>>
>> I have squashed 0002 from my earlier patchset and your 3 patches into
>> 0001.
>>
>> 0002 has extra tests mentioned above. It also removes "TODO: dubious
>> error message" from a comment. I don't see anything dubious in the
>> error message. I think this patch is safe to be merged into 0001.
>>
>> 0003 is changed to ddl.sgml. Those need a bit more work as described
>> above.
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>>
>
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Japin Li | 2025-12-28 04:12:16 | 回复: cleanup: Split long Makefile lists across lines and sort them |