Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2021-07-02 07:53:52
Message-ID: 5348877.kVHUhDTtjH@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> This allows us to give presorted input to both aggregates in the following
> case:
>
> SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ...
>
> but just the first agg in this one:
>
> SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ...

I don't know if it's acceptable, but in the case where you add both an
aggregate with an ORDER BY clause, and another aggregate without the clause,
the output for the unordered one will change and use the same ordering, maybe
suprising the unsuspecting user. Would that be acceptable ?

> When testing the performance of all this I found that when a suitable
> index exists to provide pre-sorted input for the aggregation that the
> performance does improve. Unfortunately, it looks like things get more
> complex when no index exists. In this case, since we're setting
> pathkeys to tell the planner we need a plan that provides pre-sorted
> input to the aggregates, the planner will add a sort below the
> aggregate node. I initially didn't see any problem with that as it
> just moves the sort to a Sort node rather than having it done
> implicitly inside nodeAgg.c. The problem is, it just does not perform
> as well. I guess this is because when the sort is done inside
> nodeAgg.c that the transition function is called in a tight loop while
> reading records back from the tuplestore. In the patched version,
> there's an additional node transition in between nodeAgg and nodeSort
> and that causes slower performance. For now, I'm not quite sure what
> to do about that. We set the plan pathkeys well before we could
> possibly decide if asking for pre-sorted input for the aggregates
> would be a good idea or not.

I was curious about the performance implication of that additional transition,
and could not reproduce a signifcant difference. I may be doing something
wrong: how did you highlight it ?

Regards,

--
Ronan Dunklau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2021-07-02 08:31:41 Re: rand48 replacement
Previous Message gkokolatos 2021-07-02 07:35:00 Re: Teach pg_receivewal to use lz4 compression