Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2022-07-26 07:39:25
Message-ID: CAMbWs4_hAK6+0Gk=vZX+ikSFBx=6981iFLGkgLObDkvvGpjogg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 26, 2022 at 7:38 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Fri, 22 Jul 2022 at 21:33, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > I can see this problem with
> > the query below:
> >
> > select max(b order by b), max(a order by a) from t group by a;
> >
> > When processing the first aggregate, we compose the 'currpathkeys' as
> > {a, b} and mark this aggregate in 'aggindexes'. When it comes to the
> > second aggregate, we compose its pathkeys as {a} and decide that it is
> > not stronger than 'currpathkeys'. So the second aggregate is not
> > recorded in 'aggindexes'. As a result, we fail to mark aggpresorted for
> > the second aggregate.
>
> Yeah, you're right. I have a missing check to see if currpathkeys are
> better than the pathkeys for the current aggregate. In your example
> case we'd have still processed the 2nd aggregate the old way instead
> of realising we could take the new pre-sorted path for faster
> processing.
>
> I've adjusted that in the attached to make it properly include the
> case where currpathkeys are better.

Thanks. Verified problem is solved in v8 patch.

Also I'm wondering if it's possible to take into consideration the
ordering indicated by existing indexes when determining the pathkeys. So
that for the query below we can avoid the Incremental Sort node if we
consider that there is an index on t(a, c):

# explain (costs off) select max(b order by b), max(c order by c) from t
group by a;
QUERY PLAN
---------------------------------------------
GroupAggregate
Group Key: a
-> Incremental Sort
Sort Key: a, b
Presorted Key: a
-> Index Scan using t_a_c_idx on t
(6 rows)

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-07-26 07:41:57 Re: Introduce wait_for_subscription_sync for TAP tests
Previous Message Fujii Masao 2022-07-26 07:26:23 Re: remove more archiving overhead