Side effect of remove_useless_groupby_columns

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Side effect of remove_useless_groupby_columns
Date: 2021-02-28 07:52:24
Message-ID: CAMbWs49Mzis1vkya0W=BYkaoS7yYT-+RRm+R7-eaKHSJtDgzoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

When looking at [1], I realized we may have a side effect when removing
redundant columns in the GROUP BY clause. Suppose we have a query with
ORDER BY 'b', and meanwhile column 'b' is also a group key. If we decide
that 'b' is redundant due to being functionally dependent on other GROUP
BY columns, we would remove it from group keys. This will make us lose
the opportunity to leverage the index on 'b'.

Here is an example for illustration.

# create table t (a int primary key, b int);
# insert into t select i, i%1000 from generate_series(1,1000000)i;
# create index on t(b);

By default, we remove 'b' from group keys and generate a plan as below:

# explain (costs off) select b from t group by a, b order by b limit 10;
QUERY PLAN
------------------------------------------------
Limit
-> Sort
Sort Key: b
-> Group
Group Key: a
-> Index Scan using t_pkey on t
(6 rows)

The index on 'b' is not being used and we'll have to retrieve all the
data underneath to perform the sort work.

On the other hand, if we keep 'b' as a group column, we can get such a
plan as:

# explain (costs off) select b from t group by a, b order by b limit 10;
QUERY PLAN
-------------------------------------------------
Limit
-> Group
Group Key: b, a
-> Incremental Sort
Sort Key: b, a
Presorted Key: b
-> Index Scan using t_b_idx on t
(7 rows)

With the help of 't_b_idx', we can avoid the full scan on 't' and it
would run much faster.

Any thoughts?

[1]
https://www.postgresql.org/message-id/flat/16869-26346b77d6ccaeec%40postgresql.org

Thanks
Richard

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-02-28 09:14:59 Re: Side effect of remove_useless_groupby_columns
Previous Message japin 2021-02-28 06:36:01 Re: NOT VALID for Unique Indexes