Re: Aggregate node doesn't include cost for sorting

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Aggregate node doesn't include cost for sorting
Date: 2022-12-08 10:40:58
Message-ID: CAApHDvpoAo+AW1+n094USVzOgaSwdrytGPYtpVVghBbLbJ+-ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 8 Dec 2022 at 22:06, David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
> The cost of an Aggregate node seems to be the same regardless of the
> input being pre-sorted or not. If however the input is not sorted, the
> Aggregate node must additionally perform a sort which can impact runtime
> significantly. Here is an example:
>
> -- Unsorted input. Aggregate node must additionally sort all rows.
> EXPLAIN ANALYZE SELECT COUNT(DISTINCT(col1)) FROM foo;
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=2584.00..2584.01 rows=1 width=8) (actual
> time=1684.705..1684.809 rows=1 loops=1)
> -> Seq Scan on foo (cost=0.00..2334.00 rows=100000 width=71)

This output surely must be from a version of PostgreSQL prior to
1349d279? I can't quite figure out why you've added a "SET
enable_seqscan = FALSE;". That makes it look like you've used the same
version of PostgreSQL to produce both of these plans. The 2nd plan
you've shown must be post 1349d279.

So, with the assumption that you've used 2 different versions to show
this output, for post 1349d279, there does not seem to be much choice
on how the aggregate is executed. What's your concern about the
costings having to be accurate given there's no other plan choice?

The new pre-sorted aggregate code will always request the sort order
that suits the largest number of ORDER BY / DISTINCT aggregates. When
there are multiple ORDER BY / DISTINCT aggregates and they have
different sort requirements then there certainly are completing ways
that the aggregation portion of the plan can be executed. I opted to
make the choice just based on the number of aggregates that could
become presorted. nodeAgg.c is currently not very smart about sharing
sorts between multiple aggregates with the same sort requirements. If
that was made better, there might be more motivation to have better
costing code in make_pathkeys_for_groupagg(). However, the reasons
for the reversion of db0d67db seemed to be mainly around the lack of
ability to accurately cost multiple competing sort orders. We'd need
to come up with some better way to do that if we were to want to give
make_pathkeys_for_groupagg() similar abilities.

> Also, why does the Aggregate node sort itself? Why don't we instead emit
> an explicit Sort node in the plan so that it's visible to the user what
> is happening? As soon as there's also a GROUP BY in the query, a Sort
> node occurs in the plan. This seems inconsistent.

Post 1349d279 we do that, but it can only do it for 1 sort order.
There can be any number of aggregate functions which require a sort
and they don't all have to have the same sort order requirements. We
can't do the same as WindowAgg does by chaining nodes together either
because the aggregate node aggregates the results and we'd need all
the same input rows to be available at each step.

The only other way would be to have it so an Aggregate node could be
fed by multiple different input nodes and then it would only work on
the aggregates that suit the given input before reading the next input
and doing the other aggregates. Making it work like that would cause
quite a bit of additional effort during planning (not to mention the
executor). We'd have to run the join search once per required order,
which is one of the slowest parts of planning. Right now, you could
probably make that work by just writing the SQL to have a subquery per
sort requirement.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2022-12-08 11:01:02 Re: Allow batched insert during cross-partition updates
Previous Message Dean Rasheed 2022-12-08 10:03:29 Supporting MERGE on updatable views