Re: [PATCH] Incremental sort

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort
Date: 2017-04-26 15:39:44
Message-ID: CAPpHfdu1UQmqioJt-UKt_o_MuXdAGovDgqRiUB8Sa092fVRvXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, Mar 29, 2017 at 5:14 PM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:

> I added to cost_sort() extra costing for incremental sort: cost of extra
> tuple copying and comparing as well as cost of tuplesort reset.
> The only problem is that I made following estimate for tuplesort reset:
>
> run_cost += 10.0 * cpu_tuple_cost * num_groups;
>
>
> It makes ordinal sort to be selected in your example, but it contains
> constant 10 which is quite arbitrary. It would be nice to evade such hard
> coded constants, but I don't know how could we calculate such cost
> realistically.
>

That appears to be wrong. I intended to make cost_sort prefer plain sort
over incremental sort for this dataset size. But, that appears to be not
always right solution. Quick sort is so fast only on presorted data.
On my laptop I have following numbers for test case provided by Heikki.

Presorted data – very fast.

# explain select count(*) from (select * from sorttest order by a, c) as t;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=147154.34..147154.35 rows=1 width=8)
-> Sort (cost=132154.34..134654.34 rows=1000000 width=12)
Sort Key: sorttest.a, sorttest.c
-> Seq Scan on sorttest (cost=0.00..15406.00 rows=1000000
width=12)
(4 rows)

# select count(*) from (select * from sorttest order by a, c) as t;
count
---------
1000000
(1 row)

Time: 260,752 ms

Not presorted data – not so fast. It's actually slower than incremental
sort was.

# explain select count(*) from (select * from sorttest order by a desc, c
desc) as t;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=130063.84..130063.85 rows=1 width=8)
-> Sort (cost=115063.84..117563.84 rows=1000000 width=12)
Sort Key: sorttest.a DESC, sorttest.c DESC
-> Seq Scan on sorttest (cost=0.00..15406.00 rows=1000000
width=12)
(4 rows)

# select count(*) from (select * from sorttest order by a desc, c desc) as
t;
count
---------
1000000
(1 row)

Time: 416,207 ms

Thus, it would be nice to reflect the fact that our quicksort
implementation is very fast on presorted data. Fortunately, we have
corresponding statistics: STATISTIC_KIND_CORRELATION. However, it probably
should be a subject of a separate patch.

But I'd like to make incremental sort not slower than quicksort in case of
presorted data. New idea about it comes to my mind. Since cause of
incremental sort slowness in this case is too frequent reset of tuplesort,
then what if we would artificially put data in larger groups. Attached
revision of patch implements this: it doesn't stop to accumulate tuples to
tuplesort until we have MIN_GROUP_SIZE tuples.

# explain select count(*) from (select * from sorttest order by a, c) as t;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Aggregate (cost=85412.43..85412.43 rows=1 width=8)
-> Incremental Sort (cost=0.46..72912.43 rows=1000000 width=12)
Sort Key: sorttest.a, sorttest.c
Presorted Key: sorttest.a
-> Index Only Scan using i_sorttest on sorttest
(cost=0.42..30412.42 rows=1000000 width=12)
(5 rows)

# select count(*) from (select * from sorttest order by a, c) as t;
count
---------
1000000
(1 row)

Time: 251,227 ms

# explain select count(*) from (select * from sorttest order by a desc, c
desc) as t;
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Aggregate (cost=85412.43..85412.43 rows=1 width=8)
-> Incremental Sort (cost=0.46..72912.43 rows=1000000 width=12)
Sort Key: sorttest.a DESC, sorttest.c DESC
Presorted Key: sorttest.a
-> Index Only Scan Backward using i_sorttest on sorttest
(cost=0.42..30412.42 rows=1000000 width=12)
(5 rows)

# select count(*) from (select * from sorttest order by a desc, c desc) as
t;
count
---------
1000000
(1 row)

Time: 253,270 ms

Now, incremental sort is not slower than quicksort. And this seems to be
cool.
However, in the LIMIT case we will pay the price of fetching some extra
tuples from outer node. But, that doesn't seem to hurt us too much.

# explain select * from sorttest order by a, c limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Limit (cost=0.46..0.84 rows=10 width=12)
-> Incremental Sort (cost=0.46..37500.78 rows=1000000 width=12)
Sort Key: a, c
Presorted Key: a
-> Index Only Scan using i_sorttest on sorttest
(cost=0.42..30412.42 rows=1000000 width=12)
(5 rows)

# select * from sorttest order by a, c limit 10;
a | b | c
----+----+----
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
4 | 4 | 4
5 | 5 | 5
6 | 6 | 6
7 | 7 | 7
8 | 8 | 8
9 | 9 | 9
10 | 10 | 10
(10 rows)

Time: 0,903 ms

Any thoughts?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
incremental-sort-6.patch application/octet-stream 113.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2017-04-26 15:40:35 Re: tablesync patch broke the assumption that logical rep depends on?
Previous Message Bruce Momjian 2017-04-26 15:16:17 Re: PG 10 release notes