From: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Consider explicit incremental sort for Append and MergeAppend |
Date: | 2025-07-07 13:38:58 |
Message-ID: | a74d2648-c93e-4686-a8d4-2843515b8702@gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 19/5/2025 15:21, Robert Haas wrote:
> On Thu, May 15, 2025 at 9:03 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> 2. IncrementalSort is not always more effective - it depends on the
>> column's number of groups. In my experience, a non-cost-based decision
>> one day meets the problematic case, and the people who stick with it are
>> much more confused than in the case when planner decision connected to
>> the costings - they trust the cost model or the cost model tuned by GUCs.
>
> I wonder if this could be fixed in nodeIncrementalSort.c. I think it's
> a problem to rely on planner estimates because planner estimates of
> the # of groups are quite unreliable. But at runtime, we can notice
> whether the groups are small or large and adjust the algorithm
> accordingly. In fact, it looks like we already have some logic for
> that:
Yes, the cost estimation issue is the source of my concern. Postgres
estimation of the number of groups is still not ideal [1]. The cost sort
model doesn't take into account the number of columns [2] - perhaps
something else.
Therefore, if IncrementalSort performs worse in some cases, this change
may result in an increase in user reports.
However, comparing Sort and IncrementalSort (see the attachment), I
recall that IncrementalSort surpasses plain Sort in both corner cases:
tiny groups (beating Sort in terms of memory consumption and smaller
sorting sets) and massive groups (abbreviated keys reduce the number of
compared columns). And it is not easy to invent the worst case where the
Sort node wins.
The only aspect I can't stop thinking about is the cost balance:
sometimes the planner prefers Sort+SeqScan to IncrementalSort+IndexScan
(See explains in the attachment). A non-cost-based choice of
IncrementalSort may result in an Append path with a higher cost, which
can displace it from the path list and lead to another, less effective
choice, such as a less effective IndexScan.
However, I can't support this opinion with any real examples.
In toto, IMO it makes sense to commit this feature now and see how it
behaves.
[1]
https://www.postgresql.org/message-id/ba0edc53-4b1f-4c67-92d1-29aeddb36a18@gmail.com
[2]
https://www.postgresql.org/message-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com
--
regards, Andrei Lepikhov
Attachment | Content-Type | Size |
---|---|---|
incremental-sort-bench.sql | application/sql | 4.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2025-07-07 13:42:50 | Re: Changing shared_buffers without restart |
Previous Message | Ashutosh Bapat | 2025-07-07 13:30:08 | Re: Using failover slots for PG-non_PG logical replication |