From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
Cc: | Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Andy Fan <zhihuifan1213(at)163(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikita Malakhov <HukuToc(at)gmail(dot)com> |
Subject: | Re: MergeAppend could consider sorting cheapest child path |
Date: | 2025-09-01 20:26:03 |
Message-ID: | CAPpHfduc2trU_0-750dhdLrpEVEiO1=M-aCJh4kDnrujC0-iCg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 31, 2025 at 5:20 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 27/7/2025 00:51, Alexander Korotkov wrote:
> > On Tue, Jul 22, 2025 at 2:13 PM Andrei Lepikhov <lepihov(at)gmail(dot)com
> > I've another idea. cost_tuplesort() puts 2.0 under logarithm to prefer
> > tuplesort over heapsort. I think we can adjust cost_gather_merge() and
> > cost_merge_append() to do the same. 0001 patch implements that. I
> > think the plan changes of 0001 might be reasonable since most cases deal
> > with small rowsets. One thing concerns me: 0002 still affects one of
> > the postgres_fdw checks. Could you, please, take a look?
> Thanks for the idea!
> I analysed your approach a little bit.
> Initially, I ran the test script I had created previously [1] and
> discovered that on a large scale (1e6 - 1e7 tuples), the plan still
> defaults to MergeAppend, which deviates from the execution time (7190 ms
> for Sort+Append and 8450 ms for MergeAppend+Sort).
>
> Attempting to find out the reason, I combined all the costs into a
> single formula for each strategy:
>
> MergeAppend+Sort:
> total_cost =CO*ntuples*(1+2*log(ntuples)) + Ccput * 0.5 * ntuples+
> 2*CO*N*log(N) + A
> Sort+Append:
> total_cost = CO*ntuples*(1+2*log(ntuples))+ Ccput * 0.5 * ntuples + A
>
> Terms:
> - A - sum of total costs of underlying subtrees
> - CO - cpu_operator_cost
> - Ccput - cpu_tuple_cost
> - N - number of subpaths (streams)
>
> Given the significant gap in total execution time between these
> strategies, I believe it would be reasonable to introduce a coefficient
> to the equation's 'ntuples' variable component that will keep the gap
> between big quicksort and MergeAppend's heapsort out of the fuzzy factor
> gap.
>
> Discovering papers on the value of constant in quicksort [2] and
> heapsort [3], I realised that there is a difference. The constant's
> value varies in a wide range: 1.3-1.5 for quicksort and 2-3 for
> heapsort. Considering that we should change the current cost model as
> little as possible, not to break the balance, we may just increase the
> constant value for the heap sort to maintain a bare minimum gap between
> strategies out of the fuzzy factor. In this case, the merge append
> constant should be around 3.8 - 4.0.
>
> With this minor change, we see a shift in the regression tests. Most of
> these changes were introduced by the new append strategy. Although I
> haven't analysed these changes in depth yet, I believe they are all
> related to the small data sets and should fade out on a larger scale.
>
> See this minor correction in the attachment. postgres_fdw tests are
> stable now.
I have another idea. What if we allow MergeAppend paths only when at
least one subpath is preordered. This trick also allow us to exclude
MergeAppend(Sort) dominating Sort(Append). I see the regression tests
changes now have much less volume and looks more reasonable. What do
you think?
Also, do you think get_cheapest_fractional_path_for_pathkeys_ext() and
get_cheapest_path_for_pathkeys_ext() should consider incremental sort?
The revised patch teaches them to do so.
------
Regards,
Alexander Korotkov
Supabase
Attachment | Content-Type | Size |
---|---|---|
v10-0001-Consider-an-explicit-sort-of-the-MergeAppend-sub.patch | application/octet-stream | 29.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jelte Fennema-Nio | 2025-09-01 20:27:51 | Re: Extension security improvement: Add support for extensions with an owned schema |
Previous Message | Peter Geoghegan | 2025-09-01 19:18:48 | Re: Orphan page in _bt_split |