Re: Aggregate node doesn't include cost for sorting

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Aggregate node doesn't include cost for sorting
Date: 2022-12-08 20:56:43
Message-ID: CAApHDvqb4Wm0FKuJqGksEoirKHwC5nUvFP3TQr3VU93jdDWNkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 9 Dec 2022 at 01:12, David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
> Both plans were captured on 14.5, which is indeed prior to 1349d279.
>
> I disabled sequential scan to show that there's an alternative plan
> which is superior to the chosen plan: Index Only Scan is more expensive
> and takes longer than the Seq Scan, but the subsequent Aggregate runs
> much faster as it doesn't have to sort, making the plan overall superior.

Aha, 14.5. What's going on there is that it's still doing the sort.
The aggregate code in that version does not skip the sort because of
the presorted input. A likely explanation for the performance increase
is due to the presorted check in our qsort implementation. The
successful presort check is O(N), whereas an actual sort is O(N *
logN).

It's true that if we had been doing proper costing on these ORDER BY /
DISTINCT aggregates that we could have noticed that the input path's
pathkeys indicate that no sort is required and costed accordingly, but
if we'd gone to the trouble of factoring that into the costs, then it
would also have made sense to make nodeAgg.c not sort on presorted
input. We got the latter in 1349d279. It's just we didn't do anything
about the costings in that commit.

Anyway, in the next version of Postgres, the planner is highly likely
to choose the 2nd plan in your original email. It'll also be even
faster than you've shown due to the aggregate code not having to store
and read tuples in the tuplesort object. Also, no O(N) presort check
either. The performance should be much closer to what it would be if
you disabled seqscan and dropped the DISTINCT out of your aggregate.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-12-08 21:00:10 Re: Error-safe user functions
Previous Message Antonin Houska 2022-12-08 19:44:05 sendFileWithContent() does not advance the source pointer