Removing redundant grouping columns

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Removing redundant grouping columns
Date: 2022-12-27 22:18:09
Message-ID: 185315.1672179489@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This patch is aimed at being smarter about cases where we have
redundant GROUP BY entries, for example

SELECT ... WHERE a.x = b.y GROUP BY a.x, b.y;

It's clearly not necessary to perform grouping using both columns.
Grouping by either one alone would produce the same results,
assuming compatible equality semantics. I'm not sure how often
such cases arise in the wild; but we have about ten of them in our
regression tests, which makes me think it's worth the trouble to
de-duplicate as long as it doesn't cost too much. And it doesn't,
because PathKey construction already detects exactly this sort of
redundancy. We need only do something with the knowledge.

We can't simply make the planner replace parse->groupClause with
a shortened list of non-redundant columns, because it's possible
that we prove all the columns redundant, as in

SELECT ... WHERE a.x = 1 GROUP BY a.x;

If we make parse->groupClause empty then some subsequent tests
will think no grouping was requested, leading to incorrect results.
So what I've done in the attached is to invent a new PlannerInfo
field processed_groupClause to hold the shortened list, and then run
around and use that instead of parse->groupClause where appropriate.

(Another way could be to invent a bool hasGrouping to remember whether
groupClause was initially nonempty, analogously to hasHavingQual.
I rejected that idea after finding that there were still a few
places where it's advantageous to use the original full list.)

Beyond that, there's not too much to this patch. I had to fix
nodeAgg.c to not crash when grouping on zero columns, and I spent
some effort on refactoring the grouping-clause preprocessing
logic in planner.c because it seemed to me to have gotten rather
unintelligible. I didn't add any new test cases, because the changes
in existing results seem to sufficiently prove that it works.

I'll stick this in the January CF.

regards, tom lane

Attachment Content-Type Size
remove-redundant-GROUP-BY-1.patch text/x-diff 37.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zheng Li 2022-12-27 22:32:59 Re: Support logical replication of DDLs
Previous Message Andres Freund 2022-12-27 22:17:23 Re: build gcc warning