Multiple grouping set specs referencing duplicate alias

From: David Kimura <david(dot)g(dot)kimura(at)gmail(dot)com>
To: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Multiple grouping set specs referencing duplicate alias
Date: 2022-10-21 17:11:38
Message-ID: CAHnPFjSdFx_TtNpQturPMkRSJMYaD5rGP2=8iFH9V24-OjHGiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all!

I think I may have stumbled across a case of wrong results on HEAD (same
through version 9.6, though interestingly 9.5 produces different results
altogether).

test=# SELECT i AS ai1, i AS ai2 FROM generate_series(1,3)i GROUP BY
ai2, ROLLUP(ai1) ORDER BY ai1, ai2;

ai1 | ai2
-----+-----
1 | 1
| 1
2 | 2
| 2
3 | 3
| 3
(6 rows)

I had expected:

ai1 | ai2
-----+-----
1 | 1
2 | 2
3 | 3
| 1
| 2
| 3
(6 rows)

It seems to me that the plan is missing a Sort node (on ai1 and ai2) above the
Aggregate node.

QUERY PLAN
------------------------------------------------
GroupAggregate
Group Key: i, i
Group Key: i
-> Sort
Sort Key: i
-> Function Scan on generate_series i

I have a hunch part of the issue may be an assumption that a duplicate aliased
column will produce the same column values and hence isn't included in the
range table, nor subsequently the pathkeys. However, that assumption does not
seem to be true for queries with multiple grouping set specifications:

test=# SELECT i as ai1, i as ai2 FROM (values (1),(2),(3)) v(i) GROUP
BY ai1, ROLLUP(ai2);
ai1 | ai2
-----+-----
1 | 1
2 | 2
3 | 3
1 |
2 |
3 |
(6 rows)

It seems to me that excluding the duplicate alias from the pathkeys is leading
to a case where the group order is incorrectly determined to satisfy the sort
order. Thus create_ordered_paths() decides against applying an explicit sort
node. But simply forcing an explicit sort still seems wrong since we've
effectively lost a relevant column for the sort.

I tinkered a bit and hacked together an admittedly ugly patch that triggers an
explicit sort constructed from the parse tree. An alternative approach I had
considered was to update the rewriteHandler to explicitly force the existence of
the duplicate alias column in the range tables. But that also felt meh.

Does this seem like a legit issue? And if so, any thoughts on alternative
approaches?

Thanks,
David Kimura

Attachment Content-Type Size
fix-multiple-grouping-set-spec-wrong-results.patch application/x-patch 13.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anton Voloshin 2022-10-21 17:23:33 patch suggestion: Fix citext_utf8 test's "Turkish I" with ICU collation provider
Previous Message Tomas Vondra 2022-10-21 16:44:17 Re: Missing update of all_hasnulls in BRIN opclasses