Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2021-07-19 06:32:34
Message-ID: 4285468.TQqHX9kgJM@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Looks like I did a sloppy job of that. I messed up the condition in
> standard_qp_callback() that sets the ORDER BY aggregate pathkeys so
> that it accidentally set them when there was an unsortable GROUP BY
> clause, as highlighted by the postgres_fdw tests failing. I've now
> added a comment to explain why the condition is the way it is so that
> I don't forget again.
>
> Here's a cleaned-up version that passes make check-world.
>

I've noticed that when the ORDER BY is a grouping key (which to be honest
doesn't seem to make much sense to me), the sort key is duplicated, as
demonstrated by one of the modified tests (partition_aggregate.sql).

This leads to additional sort nodes being added when there is no necessity to
do so. In the case of sort and index pathes, the duplicate keys are not
considered, I think the same should apply here.

It means the logic for appending the order by pathkeys to the existing group
by pathkeys would ideally need to remove the redundant group by keys from the
order by keys, considering this example:

regression=# explain select sum(unique1 order by ten, two), sum(unique1 order
by four), sum(unique1 order by two, four) from tenk2 group by ten;
QUERY PLAN
------------------------------------------------------------------------
GroupAggregate (cost=1109.39..1234.49 rows=10 width=28)
Group Key: ten
-> Sort (cost=1109.39..1134.39 rows=10000 width=16)
Sort Key: ten, ten, two
-> Seq Scan on tenk2 (cost=0.00..445.00 rows=10000 width=16)

We would ideally like to sort on ten, two, four to satisfy the first and last
aggref at once. Stripping the common prefix (ten) would eliminate this problem.

Also, existing regression tests cover the first problem (order by a grouping
key) but I feel like they should be extended with a case similar as the above
to check which pathkeys are used in the "multiple ordered aggregates + group
by" cases.

--
Ronan Dunklau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-07-19 06:35:47 Re: Skipping logical replication transactions on subscriber side
Previous Message Peter Eisentraut 2021-07-19 06:29:42 Re: Replace remaining castNode(…, lfirst(…)) and friends calls with l*_node()