Re: MergeAppend could consider sorting cheapest child path

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: Andy Fan <zhihuifan1213(at)163(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikita Malakhov <HukuToc(at)gmail(dot)com>
Subject: Re: MergeAppend could consider sorting cheapest child path
Date: 2025-06-03 13:23:47
Message-ID: 2bc323df-0662-406a-8abf-dade726b31d8@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/6/2025 20:21, Alexander Korotkov wrote:
> I have the following question. I see patch changes some existing
> plans from Sort(Append(...)) to MergeAppend(Sort(), ..., Sort(...)) or
> even Materialize(MergeAppend(Sort(), ..., Sort(...))). This should be
> some problem in cost_sort(). Otherwise, that would mean that Sort
> node doesn't know how to do its job: explicit splitting dataset into
> pieces then merging sorting result appears to be cheaper, but Sort
> node contains merge-sort algorithm inside and it's supposed to be more
> efficient. Could you, please, revise the patch to avoid these
> unwanted changes?
I think, this issue is related to corner-cases of the
compare_path_costs_fuzzily.

Let's glance into one of the problematic queries:
EXPLAIN (COSTS ON)
SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C"
ORDER BY 1;

if you play with the plan, you can find that total_cost of the
Sort->Append path is cheaper:

Sort (cost=2.40..2.41 rows=4 width=40)
-> Append (cost=1.15..2.36 rows=4 width=40)
Merge Append (cost=2.37..2.42 rows=4 width=40)

But the difference is less than fuzz_factor. In this case, Postgres
probes startup_cost, which is obviously less for the MergeAppend strategy.
This is a good decision, and I think it should stay as is.
What can we do here? We might change the test to increase the cost gap.
However, while designing this patch, I skimmed through each broken query
and didn't find a reason to specifically shift to the Sort->Append
strategy, as it tested things that were not dependent on Append or Sort.

To establish a stable foundation for discussion, I conducted simple
tests - see, for example, a couple of queries in the attachment. As I
see it, Sort->Append works faster: in my test bench, it takes 1250ms on
average versus 1430ms, and it also has lower costs - the same for data
with and without massive numbers of duplicates. Playing with sizes of
inputs, I see the same behaviour.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
merge_sort.sql application/sql 6.1 KB
v6-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patch text/plain 29.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2025-06-03 13:38:47 Re: MergeAppend could consider sorting cheapest child path
Previous Message Alexander Korotkov 2025-06-03 13:21:19 Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly