|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|Cc:||David Rowley <dgrowleyml(at)gmail(dot)com>|
|Subject:||remove_useless_groupby_columns is too enthusiastic|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
I looked into the complaint at  about the planner being much
stupider when one side of a JOIN USING is referenced than the other
side. It seemed to me that that shouldn't be happening, because
the relevant decisions are made on the basis of EquivalenceClasses
and both USING columns should be in the same EquivalenceClass.
I was right about that ... but the damage is being done far upstream
of any EquivalenceClass work. It turns out that the core of the
issue is that the query looks like
SELECT ... t1 JOIN t2 USING (x)
GROUP BY x, t2.othercol ORDER BY t2.othercol LIMIT n
In the "okay" case, we resolve "GROUP BY x" as GROUP BY t1.x.
Later on, we are able to realize that ordering by t1.x is
equivalent to ordering by t2.x (because same EquivalenceClass),
and that it's equally good to consider the GROUP BY items in
either order, and that ordering by t2.othercol, t2.x would
allow us to perform the grouping using a GroupAggregate on
data that's already sorted to meet the ORDER BY requirement.
Since there happens to be an index on t2.othercol, t2.x,
we can implement the query with no explicit sort, which wins
big thanks to the small LIMIT.
In the not-okay case, we resolve "GROUP BY x" as GROUP BY t2.x.
Then remove_useless_groupby_columns notices that x is t2's
primary key, so it figures that grouping by t2.othercol is
redundant and throws away that element of GROUP BY. Now there
is no apparent connection between the GROUP BY and ORDER BY
lists, defeating the optimizations that would lead to a good
choice of plan --- in fact, we conclude early on that that
index's sort ordering is entirely useless to the query :-(
I tried the attached quick-hack patch that just prevents
remove_useless_groupby_columns from removing anything that
appears in ORDER BY. That successfully fixes the complained-of
case, and it doesn't change any existing regression test results.
I'm not sure whether there are any cases that it makes significantly
(I also kind of wonder if the fundamental problem here is that
remove_useless_groupby_columns is being done at the wrong time,
and we ought to do it later when we have more semantic info.
But I'm not volunteering to rewrite it completely.)
Anyway, remove_useless_groupby_columns has been there since 9.6
and we've not previously seen reports of cases that it makes worse,
so this seems like a corner-case problem. Hence I wouldn't risk
back-patching this change. It seems worth considering for HEAD
though, so I'll stick it in the September CF.
regards, tom lane
|Next Message||Andres Freund||2022-07-12 17:33:52||Re: Cleaning up historical portability baggage|
|Previous Message||Yura Sokolov||2022-07-12 17:22:57||Re: Reducing the chunk header sizes on all memory context types|