Re: MergeAppend could consider sorting cheapest child path

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Andy Fan <zhihuifan1213(at)163(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikita Malakhov <HukuToc(at)gmail(dot)com>
Subject: Re: MergeAppend could consider sorting cheapest child path
Date: 2025-10-15 12:35:44
Message-ID: CAApHDvqx2VsjUr-ho+szxjM-=7DBp9HnR3+_SzSTT2Fr+xYKcA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 15 Oct 2025 at 22:26, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> This patch originated from the practice of how table partitioning can
> severely impact query execution. It is a rare case in our experience
> when all partitions have symmetrical indexes, especially those located
> remotely. People adopt a set of indexes according to the current load
> profile on hot partitions. In fact, it is a typical case when a
> timestamp orders partitions, and most of them are rarely updated.
>
> So, the goal is to use MergeAppend when only a few partitions lack a
> proper index.

hmm... doesn't that already work?

create table hp (a int ) partition by hash(a);
create table hp0 partition of hp for values with(modulus 2, remainder 0);
create table hp1 partition of hp for values with(modulus 2, remainder 1);
create index on hp0(a);

explain (costs off) select * from hp order by a;

Merge Append
Sort Key: hp.a
-> Index Only Scan using hp0_a_idx on hp0 hp_1
-> Sort
Sort Key: hp_2.a
-> Seq Scan on hp1 hp_2

Or is this a case of that you want to also consider Seq Scan on hp0 ->
Sort if it's cheaper than Index Scan on hp0_a_idx just in case that's
enough to make Merge Append cheap enough to beat Append -> Sort?

> The concern about memory consumption makes sense, of course. However, we
> choose to sort based on cost estimations that usually work when the
> optimiser decides between fetching many tuples during an Index Scan,
> compared to only a few tuples to fetch with subsequent sorting.
> Additionally, scan estimation typically yields good predictions
> (compared to JOIN), and I personally estimate the OOM risk to be low.
>
> Additionally, this patch revealed an issue with the cost model: there is
> no significant difference between a single massive Sort and multiple
> sorts followed by MergeAppend. Our experiments show that it is incorrect
> (one Sort operator demonstrates more efficacy) and may be corrected.

Do you mean "no significant difference [in the costings] between"?

Not sure if I follow you here. You've said "one Sort operator
demonstrates more efficacy", do you mean Sort atop of Append is
better? If so, why does the patch try to encourage plans with Merge
Append with many Sorts?

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sadhuprasad Patro 2025-10-15 12:36:47 Re: Improved TAP tests by replacing sub-optimal uses of ok() with better Test::More functions
Previous Message jian he 2025-10-15 11:34:37 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands