Re: Consider explicit incremental sort for Append and MergeAppend

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Consider explicit incremental sort for Append and MergeAppend
Date: 2025-05-19 13:21:07
Message-ID: CA+Tgmob-7XM5xsB+Vco5XmRkMYKgFi4xhCTfL2N6WJuqYS93CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 15, 2025 at 9:03 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> 2. IncrementalSort is not always more effective - it depends on the
> column's number of groups. In my experience, a non-cost-based decision
> one day meets the problematic case, and the people who stick with it are
> much more confused than in the case when planner decision connected to
> the costings - they trust the cost model or the cost model tuned by GUCs.

I wonder if this could be fixed in nodeIncrementalSort.c. I think it's
a problem to rely on planner estimates because planner estimates of
the # of groups are quite unreliable. But at runtime, we can notice
whether the groups are small or large and adjust the algorithm
accordingly. In fact, it looks like we already have some logic for
that:

/*
* Sorting many small groups with tuplesort is inefficient. In order to
* cope with this problem we don't start a new group until the current one
* contains at least DEFAULT_MIN_GROUP_SIZE tuples (unfortunately this also
* means we can't assume small groups of tuples all have the same prefix keys.)
* When we have a bound that's less than DEFAULT_MIN_GROUP_SIZE we start looking
* for the new group as soon as we've met our bound to avoid fetching more
* tuples than we absolutely have to fetch.
*/
#define DEFAULT_MIN_GROUP_SIZE 32

But maybe this logic needs to be further refined somehow. I can't
shake the feeling that it's going to be really hard to have every
place that uses incremental sort decide whether to use an incremental
sort or a regular sort -- we should try to get to a place where it's
safe to use an incremental sort when it's possible without worrying
that it might actually be slower.

Or maybe that's not possible for some reason, but it is not clear to
me what that reason might be.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2025-05-19 13:22:43 Re: Proposal: Make cfbot fail on patches not created by "git format-patch"
Previous Message Isaac Morland 2025-05-19 12:41:06 Re: Violation of principle that plan trees are read-only