From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Wrong results with grouping sets |
Date: | 2023-11-16 15:25:50 |
Message-ID: | 263bca57-acfd-44d0-9a96-fcf8dff9c977@yandex.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi! Thank you for your work on the subject.
On 25.09.2023 10:11, Richard Guo wrote:
> I think I've come across a wrong result issue with grouping sets, as
> shown by the query below.
>
> -- result is correct with only grouping sets
> select a, b
> from (values (1, 1), (2, 2)) as t (a, b) where a = b
> group by grouping sets((a, b), (a));
> a | b
> ---+---
> 1 | 1
> 1 |
> 2 | 2
> 2 |
> (4 rows)
>
> -- result is NOT correct with grouping sets and distinct on
> select distinct on (a, b) a, b
> from (values (1, 1), (2, 2)) as t (a, b) where a = b
> group by grouping sets((a, b), (a));
> a | b
> ---+---
> 1 | 1
> 2 | 2
> (2 rows)
>
> The distinct on expressions include both 'a' and 'b', so rows (1, 1) and
> (1, NULL) should not be considered equal. (The same for rows (2, 2) and
> (2, NULL).)
I noticed that this query worked correctly in the main branch with the
inequality operator:
postgres=# select distinct on (a, b) a, b from (values (3, 1), (2, 2))
as t (a, b) where a > b group by grouping sets((a, b), (a)); a | b
---+--- 3 | 1 3 | (2 rows)
So, I think you are right)
> I think the root cause is that when we generate distinct_pathkeys, we
> failed to realize that Var 'b' might be nullable by the grouping sets,
> so it's no longer always equal to Var 'a'. It's not correct to deem
> that the PathKey for 'b' is redundant and thus remove it from the
> pathkeys list.
>
> We have the same issue when generating sort_pathkeys. As a result, we
> may have the final output in the wrong order. There were several
> reports about this issue before, such as [1][2].
>
> To fix this issue, I'm thinking that we mark the grouping expressions
> nullable by grouping sets with a dummy RTE for grouping sets, something
> like attached. In practice we do not need to create a real RTE for
> that, what we need is just a RT index. In the patch I use 0, because
> it's not a valid outer join relid, so it would not conflict with the
> outer-join-aware-Var infrastructure.
>
> If the grouping expression is a Var or PHV, we can just set its
> nullingrels, very straightforward. For an expression that is neither a
> Var nor a PHV, I'm not quite sure how to set the nullingrels. I tried
> the idea of wrapping it in a new PHV to carry the nullingrels, but that
> may cause some unnecessary plan diffs. In the patch for such an
> expression I just set the nullingrels of Vars or PHVs that are contained
> in it. This is not really 'correct' in theory, because it is the whole
> expression that can be nullable by grouping sets, not its individual
> vars. But it works in practice, because what we need is that the
> expression can be somehow distinguished from the same expression in ECs,
> and marking its vars is sufficient for this purpose. But what if the
> expression is variable-free? This is the point that needs more work.
> Fow now the patch just handles variable-free expressions of type Const,
> by wrapping it in a new PHV.
>
> There are some places where we need to artificially remove the RT index
> of grouping sets from the nullingrels, such as when we generate
> PathTarget for initial input to grouping nodes, or when we generate
> PathKeys for the grouping clauses, because the expressions there are
> logically below the grouping sets. We also need to do that in
> set_upper_references when we update the targetlist of an Agg node, so
> that we can perform exact match for nullingrels, rather than superset
> match.
>
> Since the fix depends on the nullingrels stuff, it seems not easy for
> back-patching. I'm not sure what we should do in back branches.
>
> Any thoughts?
>
> [1]
> https://www.postgresql.org/message-id/CAMbWs48AtQTQGk37MSyDk_EAgDO3Y0iA_LzvuvGQ2uO_Wh2muw@mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/17071-24dc13fbfa29672d@postgresql.org
>
I looked at your patch and noticed a few things:
1. I think you should add a test with the cube operator, because I
noticed that the order of the query in the result has also changed:
master:
postgres=# select a,b from (values(1,1),(2,2)) as t (a,b) where a=b
group by cube (a, (a,b)) order by b,a; a | b ---+--- 1 | 1 1 | 1 1 | 2 |
2 2 | 2 2 | | (7 rows)
with patch:
postgres=# select a, b from (values (1, 1), (2, 2)) as t (a, b) where a
= b group by cube(a, (a, b)) order by b,a; a | b ---+--- 1 | 1 1 | 1 2 |
2 2 | 2 1 | 2 | | (7 rows)
--
Regards,
Alena Rybakina
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2023-11-16 15:46:22 | Re: Do away with a few backwards compatibility macros |
Previous Message | Tom Lane | 2023-11-16 15:06:38 | Re: BUG #18097: Immutable expression not allowed in generated at |