Re: [PATCH] Incremental sort

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort
Date: 2017-03-29 14:14:51
Message-ID: CAPpHfdsnMTL4V57eutAt68ZeA7d4N-eWfthcXY=2PK0P5iU+kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 20, 2017 at 5:19 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 03/20/2017 11:33 AM, Alexander Korotkov wrote:
>
>> Please, find rebased patch in the attachment.
>>
>
> I had a quick look at this.
>
> * I'd love to have an explanation of what an Incremental Sort is, in the
> file header comment for nodeIncrementalSort.c.
>

Done.

* I didn't understand the maxMem stuff in tuplesort.c. The comments there
> use the phrase "on-disk memory", which seems like an oxymoron. Also,
> "maximum status" seems weird, as it assumes that there's a natural order to
> the states.
>

Variables were renamed.

* In the below example, the incremental sort is significantly slower than
> the Seq Scan + Sort you get otherwise:
>
> create table foo (a int4, b int4, c int4);
> insert into sorttest select g, g, g from generate_series(1, 1000000) g;
> vacuum foo;
> create index i_sorttest on sorttest (a, b, c);
> set work_mem='100MB';
>
>
> postgres=# explain select count(*) from (select * from sorttest order by
> a, c) as t;
> QUERY PLAN
> ------------------------------------------------------------
> -------------------------------------------
> Aggregate (cost=138655.68..138655.69 rows=1 width=8)
> -> Incremental Sort (cost=610.99..124870.38 rows=1102824 width=12)
> Sort Key: sorttest.a, sorttest.c
> Presorted Key: sorttest.a
> -> Index Only Scan using i_sorttest on sorttest
> (cost=0.43..53578.79 rows=1102824 width=12)
> (5 rows)
>
> Time: 0.409 ms
> postgres=# select count(*) from (select * from sorttest order by a, c) as
> t;
> count
> ---------
> 1000000
> (1 row)
>
> Time: 387.091 ms
>
>
> postgres=# explain select count(*) from (select * from sorttest order by
> a, c) as t;
> QUERY PLAN
> ------------------------------------------------------------
> -------------------
> Aggregate (cost=130063.84..130063.85 rows=1 width=8)
> -> Sort (cost=115063.84..117563.84 rows=1000000 width=12)
> Sort Key: sorttest.a, sorttest.c
> -> Seq Scan on sorttest (cost=0.00..15406.00 rows=1000000
> width=12)
> (4 rows)
>
> Time: 0.345 ms
> postgres=# select count(*) from (select * from sorttest order by a, c) as
> t;
> count
> ---------
> 1000000
> (1 row)
>
> Time: 231.668 ms
>
> According to 'perf', 85% of the CPU time is spent in ExecCopySlot(). To
> alleviate that, it might be worthwhile to add a special case for when the
> group contains exactly one group, and not put the tuple to the tuplesort in
> that case.

I'm not sure we should do such optimization for one tuple per group, since
it's similar situation with 2 or 3 tuples per group.

> Or if we cannot ensure that the Incremental Sort is actually faster, the
> cost model should probably be smarter, to avoid picking an incremental sort
> when it's not a win.

I added to cost_sort() extra costing for incremental sort: cost of extra
tuple copying and comparing as well as cost of tuplesort reset.
The only problem is that I made following estimate for tuplesort reset:

run_cost += 10.0 * cpu_tuple_cost * num_groups;

It makes ordinal sort to be selected in your example, but it contains
constant 10 which is quite arbitrary. It would be nice to evade such hard
coded constants, but I don't know how could we calculate such cost
realistically.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
incremental-sort-4.patch application/octet-stream 112.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-03-29 14:14:53 Re: Transactions involving multiple postgres foreign servers
Previous Message Peter Eisentraut 2017-03-29 13:38:12 Re: [PATCH] Reduce src/test/recovery verbosity