Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2021-07-20 11:06:40
Message-ID: CAApHDvrQf_YhWvMjij2q-PCjN0PMLudHmm1aObwd8BEjzwu3nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 19 Jul 2021 at 18:32, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:
> It means the logic for appending the order by pathkeys to the existing group
> by pathkeys would ideally need to remove the redundant group by keys from the
> order by keys, considering this example:
>
> regression=# explain select sum(unique1 order by ten, two), sum(unique1 order
> by four), sum(unique1 order by two, four) from tenk2 group by ten;
> QUERY PLAN
> ------------------------------------------------------------------------
> GroupAggregate (cost=1109.39..1234.49 rows=10 width=28)
> Group Key: ten
> -> Sort (cost=1109.39..1134.39 rows=10000 width=16)
> Sort Key: ten, ten, two
> -> Seq Scan on tenk2 (cost=0.00..445.00 rows=10000 width=16)
>
>
> We would ideally like to sort on ten, two, four to satisfy the first and last
> aggref at once. Stripping the common prefix (ten) would eliminate this problem.

hmm, yeah. That's not great. This comes from the way I'm doing
list_concat on the pathkeys from the GROUP BY with the ones from the
ordered aggregates. If it were possible to use
make_pathkeys_for_sortclauses() to make these in one go, that would
fix the problem since pathkey_is_redundant() would skip the 2nd "ten".
Unfortunately, it's not possible to pass the combined list of
SortGroupClauses to make_pathkeys_for_sortclauses since they're not
from the same targetlist. Aggrefs have their own targetlist and the
SortGroupClauses for the Aggref reference that tlist.

I think to do this we'd need something like pathkeys_append() in
pathkeys.c which had a loop and appended the pathkey only if
pathkey_is_redundant returns false.

> Also, existing regression tests cover the first problem (order by a grouping
> key) but I feel like they should be extended with a case similar as the above
> to check which pathkeys are used in the "multiple ordered aggregates + group
> by" cases.

It does seem like a bit of a weird case to go to a lot of effort to
make work, but it would be nice if it did work without having to
contort the code too much.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Denis Hirn 2021-07-20 11:15:30 Re: [PATCH] Allow multiple recursive self-references
Previous Message Dilip Kumar 2021-07-20 11:03:13 Re: row filtering for logical replication