Re: [PATCH] Incremental sort

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort
Date: 2017-05-05 15:13:42
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Thu, Apr 27, 2017 at 5:23 PM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:

> On Thu, Apr 27, 2017 at 5:06 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
>> On Wed, Apr 26, 2017 at 11:39 AM, Alexander Korotkov
>> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>> > But I'd like to make incremental sort not slower than quicksort in case
>> of
>> > presorted data. New idea about it comes to my mind. Since cause of
>> > incremental sort slowness in this case is too frequent reset of
>> tuplesort,
>> > then what if we would artificially put data in larger groups. Attached
>> > revision of patch implements this: it doesn't stop to accumulate tuples
>> to
>> > tuplesort until we have MIN_GROUP_SIZE tuples.
>> >
>> > Now, incremental sort is not slower than quicksort. And this seems to
>> be
>> > cool.
>> > However, in the LIMIT case we will pay the price of fetching some extra
>> > tuples from outer node. But, that doesn't seem to hurt us too much.
>> >
>> > Any thoughts?
>> Nice idea.
> Cool.
> Than I'm going to make a set of synthetic performance tests in order to
> ensure that there is no regression.

Next revision of patch is attached.
This revision contains one important optimization. I found that it's not
necessary to make every tuple go through prevTuple slot. It's enough to
save single sample tuple per sort group in order to compare skip columns
with it. This optimization allows to evade regression on large sort groups
which I have observed.

I'm also attaching python script ( which I use for
synthetic performance benchmarking. This script runs benchmarks which are
similar to one posted by Heikki, but with some variations. These
benchmarks are aimed to check if there are cases when incremental sort is
slower than plain sort.

This script generates tables with structure described in 'tables' array.
For generation of texts, md5 function is used. For first GroupedCols
number of table columns, groups of GroupSize equal values are generated.
Then there are columns which values are just sequential. In the last column
have PreorderedFrac fraction of sequential values and rest of values are
random. Therefore, we can measure influence of presorted optimization in
qsort with various fractions of presorted data. Also there is btree index
which covers all the columns of that table.

The benchmark query select contents of generated table order by grouped
columns and by last column. Index only scan outputs tuples ordered by
grouped columns, and incremental sort have to perform sorting inside those
groups. Plain sort case is forced to also use index only scans, in order
to compare sort methods not scan methods.

Results are also attached (results.csv). Last column contains difference
between incremental and plain sort time in percents. Negative value mean
that incremental sort is faster in this case.

Incremental sort is faster in vast majority of cases. It appears to be
slower only when whose dataset is one sort group. In this case incremental
sort is useless, and it should be considered as misuse of incremental
sort. Slowdown is related to the fact that we anyway have to do extra
comparisons, unless we somehow push our comparison result into qsort itself
and save some cpu cycles (but that would be unreasonable break of
encapsulation). Thus, in such cases regression seems to be inevitable
anyway. I think we could evade this regression during query planning. If
we see that there would be only few groups, we should choose plain sort
instead of incremental sort.

Any thoughts?

Alexander Korotkov
Postgres Professional:
The Russian Postgres Company

Attachment Content-Type Size
incremental-sort-7.patch application/octet-stream 112.6 KB
results.csv text/csv 24.2 KB text/x-python-script 2.8 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-05-05 15:15:04 Re: statement_timeout is not working as expected with postgres_fdw
Previous Message Peter Eisentraut 2017-05-05 15:01:57 Re: SUBSCRIPTIONS and pg_upgrade