Re: [PATCH] Incremental sort

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort
Date: 2017-10-02 16:37:03
Message-ID: CAPpHfduS7SBC_J0frWX1tToZkKyyAjLVDQH1wcS7Rbvjg6nV-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 30, 2017 at 11:20 PM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:

> On Sat, Sep 16, 2017 at 2:46 AM, Alexander Korotkov <
> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>
>> On Thu, Sep 14, 2017 at 2:48 AM, Alexander Korotkov <
>> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>>
>>> Patch rebased to current master is attached. I'm going to improve my
>>> testing script and post new results.
>>>
>>
>> New benchmarking script and results are attached. There new dataset
>> parameter is introduced: skew factor. Skew factor defines skew in
>> distribution of groups sizes.
>> My idea of generating is just usage of power function where power is from
>> 0 to 1. Following formula is used to get group number for particular item
>> number i.
>> [((i / number_of_indexes) ^ power) * number_of_groups]
>> For example, power = 1/6 gives following distribution of groups sizes:
>> group number group size
>> 0 2
>> 1 63
>> 2 665
>> 3 3367
>> 4 11529
>> 5 31031
>> 6 70993
>> 7 144495
>> 8 269297
>> 9 468558
>>
>> For convenience, instead of power itself, I use skew factor where power =
>> 1.0 / (1.0 + skew). Therefore, with skew = 0.0, distribution of groups
>> sizes is uniform. Larger skew gives more skewed distribution (and that
>> seems to be quite intuitive). For, negative skew, group sizes are mirrored
>> as for corresponding positive skew. For example, skew factor = -5.0 gives
>> following groups sizes distribution:
>> group number group size
>> 0 468558
>> 1 269297
>> 2 144495
>> 3 70993
>> 4 31031
>> 5 11529
>> 6 3367
>> 7 665
>> 8 63
>> 9 2
>>
>> Results shows that between 2172 test cases, in 2113 incremental sort
>> gives speedup while in 59 it causes slowdown. The following 4 test cases
>> show most significant slowdown (>10% of time).
>>
>> Table GroupedCols GroupCount Skew PreorderedFrac
>> FullSortMedian IncSortMedian TimeChangePercent
>> int4|int4|numeric 1 100 -10 0
>> 1.5688240528 2.0607631207 31.36
>> text|int8|text|int4 1 1 0 0
>> 1.7785198689 <(778)%20519-8689> 2.1816160679 22.66
>> int8|int8|int4 1 10 -10 0
>> 1.136412859 1.3166360855 15.86
>> numeric|text|int4|int8 2 10 -10 1
>> 0.4403841496 0.5070910454 15.15
>>
>> As you can see, 3 of this 4 test cases have skewed distribution while one
>> of them is related to costly location-aware comparison of text. I've no
>> particular idea of how to cope these slowdowns. Probably, it's OK to have
>> slowdown in some cases while have speedup in majority of cases (assuming
>> there is an option to turn off new behavior). Probably, we should teach
>> optimizer more about skewed distributions of groups, but that doesn't seem
>> feasible for me.
>>
>> Any thoughts?
>>
>
> BTW, replacement selection sort was removed by 8b304b8b. I think it worth
> to rerun benchmarks after that, because results might be changed. Will do.
>

I've applied patch on top of c12d570f and rerun the same benchmarks.
CSV-file with results is attached. There is no dramatical changes. There
is still minority of performance regression cases while majority of cases
has improvement.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
results2.csv text/csv 151.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-10-02 16:42:36 Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
Previous Message Robert Haas 2017-10-02 16:36:02 Re: issue: record or row variable cannot be part of multiple-item INTO list