Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-04-05 13:01:10
Message-ID: 20200405130110.2bdxaz4ed5jx2eae@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 02, 2020 at 09:40:45PM -0400, James Coleman wrote:
>On Thu, Apr 2, 2020 at 8:46 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>>
>> On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> > ...
>> > 5) Overall, I think the costing is OK. I'm sure we'll find cases that
>> > will need improvements, but that's fine. However, we now have
>> >
>> > - cost_tuplesort (used to be cost_sort)
>> > - cost_full_sort
>> > - cost_incremental_sort
>> > - cost_sort
>> >
>> > I find it a bit confusing that we have cost_sort and cost_full_sort. Why
>> > don't we just keep using the dummy path in label_sort_with_costsize?
>> > That seems to be the only external caller outside costsize.c. Then we
>> > could either make cost_full_sort static or get rid of it entirely.
>>
>> This another area of the patch I haven't really modified.
>
>See attached for a cleanup of this; it removed cost_fullsort so
>label_sort_with_costsize is back to how it was.
>
>I've directly merged this into the patch series; if you'd like to see
>the diff I can send that along.
>

Thanks. Attached is v54 of the patch, with some minor changes. The main
two changes are in add_partial_path_precheck(), firstly to also consider
startup_cost, as discussed before. The second change (in 0003) is a bit
of an experiment to make add_partial_precheck() cheaper by only calling
compare_pathkeys after checking the costs first (which should be cheaper
than the function call). add_path_precheck already does it in that order
anyway.

I noticed is that compare_path_costs_fuzzily and add_path_precheck both
check consider_startup/consider_param_startup to decide whether to look
at startup_cost. add_partial_path_precheck probably should od that too.

Right now I'm running a battery of benchmarks to see if/how this affects
planner performance. Initially the results were rather noisy, but after
pinning the processes to processes (using taskset) and fixing frequency
(using cpupower) it's much better. The intermediate results seem pretty
fine (the results are withing 0.5% of the master, in both directions).
I'll share the final results.

Overall, I think this is pretty close to committable, and I'm planning
to get it committed on Monday unless someone objects.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v54-0001-Consider-low-startup-cost-when-adding-partial-path.patch text/plain 4.6 KB
v54-0002-rework-add_partial_path_precheck-too.patch text/plain 5.2 KB
v54-0003-rework-add_partial_path_precheck-check-costs-first.patch text/plain 1.7 KB
v54-0004-Implement-incremental-sort.patch text/plain 167.8 KB
v54-0005-Consider-incremental-sort-paths-in-additional-places.patch text/plain 25.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mikael Kjellström 2020-04-05 13:10:15 Re: backup manifests and contemporaneous buildfarm failures
Previous Message Fabien COELHO 2020-04-05 12:45:42 Re: pgbench - add pseudo-random permutation function