Re: [PATCH] Incremental sort

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort
Date: 2017-09-30 20:20:19
Message-ID: CAPpHfdtOiP8tQ8xUxo+YZTSJR65JqFYo63nxDJBS7y_mpW9agw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 16, 2017 at 2:46 AM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:

> On Thu, Sep 14, 2017 at 2:48 AM, Alexander Korotkov <
> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>
>> Patch rebased to current master is attached. I'm going to improve my
>> testing script and post new results.
>>
>
> New benchmarking script and results are attached. There new dataset
> parameter is introduced: skew factor. Skew factor defines skew in
> distribution of groups sizes.
> My idea of generating is just usage of power function where power is from
> 0 to 1. Following formula is used to get group number for particular item
> number i.
> [((i / number_of_indexes) ^ power) * number_of_groups]
> For example, power = 1/6 gives following distribution of groups sizes:
> group number group size
> 0 2
> 1 63
> 2 665
> 3 3367
> 4 11529
> 5 31031
> 6 70993
> 7 144495
> 8 269297
> 9 468558
>
> For convenience, instead of power itself, I use skew factor where power =
> 1.0 / (1.0 + skew). Therefore, with skew = 0.0, distribution of groups
> sizes is uniform. Larger skew gives more skewed distribution (and that
> seems to be quite intuitive). For, negative skew, group sizes are mirrored
> as for corresponding positive skew. For example, skew factor = -5.0 gives
> following groups sizes distribution:
> group number group size
> 0 468558
> 1 269297
> 2 144495
> 3 70993
> 4 31031
> 5 11529
> 6 3367
> 7 665
> 8 63
> 9 2
>
> Results shows that between 2172 test cases, in 2113 incremental sort gives
> speedup while in 59 it causes slowdown. The following 4 test cases show
> most significant slowdown (>10% of time).
>
> Table GroupedCols GroupCount Skew PreorderedFrac
> FullSortMedian IncSortMedian TimeChangePercent
> int4|int4|numeric 1 100 -10 0
> 1.5688240528 2.0607631207 31.36
> text|int8|text|int4 1 1 0 0
> 1.7785198689 <(778)%20519-8689> 2.1816160679 22.66
> int8|int8|int4 1 10 -10 0
> 1.136412859 1.3166360855 15.86
> numeric|text|int4|int8 2 10 -10 1
> 0.4403841496 0.5070910454 15.15
>
> As you can see, 3 of this 4 test cases have skewed distribution while one
> of them is related to costly location-aware comparison of text. I've no
> particular idea of how to cope these slowdowns. Probably, it's OK to have
> slowdown in some cases while have speedup in majority of cases (assuming
> there is an option to turn off new behavior). Probably, we should teach
> optimizer more about skewed distributions of groups, but that doesn't seem
> feasible for me.
>
> Any thoughts?
>

BTW, replacement selection sort was removed by 8b304b8b. I think it worth
to rerun benchmarks after that, because results might be changed. Will do.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nico Williams 2017-09-30 20:26:54 Re: alter server for foreign table
Previous Message Nikolay Shaplov 2017-09-30 20:18:35 Re: [PATCH] Tests for reloptions