Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-03-31 01:14:07
Message-ID: 20200331011407.vwz3ektatgqyi6vy@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 30, 2020 at 06:53:47PM -0400, James Coleman wrote:
>On Mon, Mar 30, 2020 at 8:24 AM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On Sun, Mar 29, 2020 at 10:16:53PM -0400, James Coleman wrote:
>> >On Sun, Mar 29, 2020 at 9:44 PM Tomas Vondra
>> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Attached is a slightly reorganized patch series. I've merged the fixes
>> >> into the appropriate matches, and I've also combined the two patches
>> >> adding incremental sort paths to additional places in planner.
>> >>
>> >> A couple more comments:
>> >>
>> >>
>> >> 1) I think the GUC documentation in src/sgml/config.sgml is a bit too
>> >> detailed, compared to the other enable_* GUCs. I wonder if there's a
>> >> better place where to move the details. What about adding some examples
>> >> and explanation to perform.sgml?
>> >
>> >I'll take a look at that and include in a patch series tomorrow.
>> >> 2) Looking at the explain output, the verbose mode looks like this:
>> >>
>> >> test=# explain (verbose, analyze) select a from t order by a, b, c;
>> >> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>> >> Gather Merge (cost=66.31..816072.71 rows=8333226 width=24) (actual time=4.787..20092.555 rows=10000000 loops=1)
>> >> Output: a, b, c
>> >> Workers Planned: 2
>> >> Workers Launched: 2
>> >> -> Incremental Sort (cost=66.28..729200.36 rows=4166613 width=24) (actual time=1.308..14021.575 rows=3333333 loops=3)
>> >> Output: a, b, c
>> >> Sort Key: t.a, t.b, t.c
>> >> Presorted Key: t.a, t.b
>> >> Full-sort Groups: 4169 Sort Method: quicksort Memory: avg=30kB peak=30kB
>> >> Presorted Groups: 4144 Sort Method: quicksort Memory: avg=128kB peak=138kB
>> >> Worker 0: actual time=0.766..16122.368 rows=3841573 loops=1
>> >> Full-sort Groups: 6871 Sort Method: quicksort Memory: avg=30kB peak=30kB
>> >> Presorted Groups: 6823 Sort Method: quicksort Memory: avg=132kB peak=141kB
>> >> Worker 1: actual time=1.986..16189.831 rows=3845490 loops=1
>> >> Full-sort Groups: 6874 Sort Method: quicksort Memory: avg=30kB peak=30kB
>> >> Presorted Groups: 6847 Sort Method: quicksort Memory: avg=130kB peak=139kB
>> >> -> Parallel Index Scan using t_a_b_idx on public.t (cost=0.43..382365.92 rows=4166613 width=24) (actual time=0.040..9808.449 rows=3333333 loops=3)
>> >> Output: a, b, c
>> >> Worker 0: actual time=0.048..11275.178 rows=3841573 loops=1
>> >> Worker 1: actual time=0.041..11314.133 rows=3845490 loops=1
>> >> Planning Time: 0.166 ms
>> >> Execution Time: 25135.029 ms
>> >> (22 rows)
>> >>
>> >> There seems to be missing indentation for the first line of worker info.
>> >
>> >Working on that too.
>See attached. I've folded in the original "explain fixes" patch into
>the main series, and the "explain fixes" patch in this series contains
>only the changes for the above.

Thanks. I'll take a look at those changes tomorrow.

>> >> I'm still not quite convinced we should be printing two lines - I know
>> >> you mentioned the lines might be too long, but see how long the other
>> >> lines may get ...
>> >
>> >All right, I give in :)
>> >
>> >Do you think non-workers (both the leader and non-parallel plans)
>> >should also move to one line?
>> >
>> I think we should use the same formatting for both cases, so yes.
>> FWIW I forgot to mention I tweaked the INSTRUMENT_SORT_GROUP macro a
>> bit, by moving the if condition in it. That makes the calls easier.
>Ah, that actually fixed some of the compile warnings. The other is
>fixed in my explain fixes patch.
>> >> 3) I see the new nodes (plan state, ...) have "presortedCols" which does
>> >> not indicate it's a "number of". I think we usually prefix names of such
>> >> fields "n" or "num". What about "nPresortedCols"? (Nitpicking, I know.)
>> >
>> >I can fix this too.
>Changed everywhere we used this var name. I'm tempted to change to
>nPresortedKeys, but a cursory glance suggests some cases might
>actually be consistent with other var names reference columns, so I'm
>not sure if we want to go down that path (and change more than just

Not sure. We use "sort keys" and "path keys" for this, but I think
"columns" is good enough.

The main thing I've been working on today is benchmarking how this
affects planning. And I'm seeing a regression that worries me a bit,

The test I'm doing is pretty simple - build a small table with a bunch
of columns:

create table t (a int, b int, c int, d int, e int, f int, g int);

insert into t select 100*random(), 100*random(), 100*random(),
100*random(), 100*random(), 100*random(), 100*random()
from generate_series(1,100000) s(i);

and then a number of indexes on subsets of up to 3 columns, as generated
using the attached script. And then run a bunch of
explains (so no actual execution) sorting the data by at least 4 columns
(to trigger incremental sort paths), measuring timing of the script.

I did a bunch of runs on current master and v46 with incremental sort
disabled and enabled, and the results look like this:

master off on
34.609 37.463 37.729

which means about 8-9% regression with incremental sort. Of course, this
is only for planning time, for execution the impact is going to be much
smaller. But it's still a bit annoying.

I've suspected this might be either due to the add_partial_path changes
or the patch adding incremental sort to additional places, so I tested
those parts individually and the answer is no - add_partial_path changes
have very small impact (~1%, which might be noise). The regression comes
mostly from the 0002 part that adds incremental sort. At least in this
particular test - different tests might behave differently, of course.

The annoying bit is that the overhead does not disappear after disabling
incremental sort. That suggests this is not merely due to considering
and creating higher number of paths, but due to something that happens
before we even look at the GUC ...

I think I've found one such place - if you look at compare_pathkeys, it
has this check right before the forboth() loop:

if (keys1 == keys2)

But with incremental sort we don't call pathkeys_contained_in, we call
pathkeys_common_contained_in instead. And that does not call
compare_pathkeys and does not have the simple equality check. Adding
the following check seems to cut the overhead in half, which is nice:

if (keys1 == keys2)
*n_common = list_length(keys1);
return true;

Not sure where the rest of the regression comes from yet.

Also, while looking at pathkeys_common_contained_in(), I've been a bit
puzzled why does is this correct:

return (key1 == NULL);

It's easy to not notice it's key1 and not keys1, so I suggest we add a
comment 'key1 == NULL' means we've processed whole keys1 list.


Tomas Vondra
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size text/plain 347 bytes text/plain 425 bytes

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-03-31 01:16:03 Re: pgsql: Improve handling of parameter differences in physical replicatio
Previous Message Cary Huang 2020-03-31 00:36:24 Re: Internal key management system