Re: remove_useless_groupby_columns is too enthusiastic

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: remove_useless_groupby_columns is too enthusiastic
Date: 2022-07-14 09:57:15
Message-ID: CAMbWs496jLH9725OTWpC1-tnra=VkFY3hQf24DxzwAkC4kO20A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 13, 2022 at 1:31 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I tried the attached quick-hack patch that just prevents
> remove_useless_groupby_columns from removing anything that
> appears in ORDER BY. That successfully fixes the complained-of
> case, and it doesn't change any existing regression test results.
> I'm not sure whether there are any cases that it makes significantly
> worse.

If there happens to be an index on t2.othercol, t2.x, the patch would
definitely win since we can perform a GroupAggregate with no explicit
sort. If there is no such index, considering the redundant sorting work
due to the excess columns, do we still win?

I'm testing with the query below:

create table t (a int primary key, b int, c int);
insert into t select i, i%1000, i from generate_series(1,1000000)i;
analyze t;
create index t_b_a_idx on t (b, a);

select sum(c) from t group by a, b order by b limit 10;

If we have index 't_b_a_idx', there would be big performance
improvement. Without the index, I can see some performance drop (I'm not
using parallel query mechanism).

> (I also kind of wonder if the fundamental problem here is that
> remove_useless_groupby_columns is being done at the wrong time,
> and we ought to do it later when we have more semantic info.

Concur with that.

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Geier 2022-07-14 10:13:38 Improving scalability of Parallel Bitmap Heap/Index Scan
Previous Message Bharath Rupireddy 2022-07-14 09:43:37 Re: Patch proposal: New hooks in the connection path