Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-06-25 23:22:05
Message-ID: 20190625232205.ssecrwplyb7faxbb@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote:
>
>Unrelated: if you or someone else you know that's more familiar with
>the parallel code, I'd be interested in their looking at the patch at
>some point, because I have a suspicion it might not be operating in
>parallel ever (either that or I don't know how to trigger it), but I'm
>not really familiar with that stuff at all currently. :)
>

That's an interesting question. I don't think plans like this would be
very interesting:

Limit
-> Incremental Sort
-> Gather Merge
-> Index Scan

because most of the extra cost would be paid in the leader anyway. So
I'm not all that surprised those paths are not generated (I might be
wrong and those plans would be interesting, though).

But I think something like this would be quite beneficial:

Limit
-> Gather Merge
-> Incremental Sort
-> Index Scan

So I've looked into that, and the reason seems fairly simple - when
generating the Gather Merge paths, we only look at paths that are in
partial_pathlist. See generate_gather_paths().

And we only have sequential + index paths in partial_pathlist, not
incremental sort paths.

IMHO we can do two things:

1) modify generate_gather_paths to also consider incremental sort for
each sorted path, similarly to what create_ordered_paths does

2) modify build_index_paths to also generate an incremental sort path
for each index path

IMHO (1) is the right choice here, because it automatically does the
trick for all other types of ordered paths, not just index scans. So,
something like the attached patch, which gives me plans like this:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.86..2.85 rows=100000 width=12) (actual time=3.726..233.249 rows=100000 loops=1)
-> Gather Merge (cost=0.86..120.00 rows=5999991 width=12) (actual time=3.724..156.802 rows=100000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Incremental Sort (cost=0.84..100.00 rows=2499996 width=12) (actual time=0.563..164.438 rows=33910 loops=3)
Sort Key: a, b
Presorted Key: a
Sort Method: quicksort Memory: 27kB
Sort Groups: 389
Worker 0: Sort Method: quicksort Memory: 27kB Groups: 1295
Worker 1: Sort Method: quicksort Memory: 27kB Groups: 1241
-> Parallel Index Scan using t_a_idx on t (cost=0.43..250612.29 rows=2499996 width=12) (actual time=0.027..128.518 rows=33926 loops=3)
Planning Time: 68559.695 ms
Execution Time: 285.245 ms
(14 rows)

This is not the whole story, though - there seems to be some costing
issue, because even with the parallel costs set to 0, I only get such
plans after I tweak the cost in the patch like this:

subpath->total_cost = 100.0;
path->path.total_cost = 120.0;

When I don't do that, the gather merge gets with total cost 1037485, and
it gets beaten by this plan:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.05..152.75 rows=100000 width=12) (actual time=0.234..374.492 rows=100000 loops=1)
-> Incremental Sort (cost=1.05..9103.09 rows=5999991 width=12) (actual time=0.232..316.210 rows=100000 loops=1)
Sort Key: a, b
Presorted Key: a
Sort Method: quicksort Memory: 27kB
Sort Groups: 2863
-> Index Scan using t_a_idx on t (cost=0.43..285612.24 rows=5999991 width=12) (actual time=0.063..240.248 rows=100003 loops=1)
Planning Time: 53743.858 ms
Execution Time: 403.379 ms
(9 rows)

I suspect it's related to the fact that for the Gather Merge plan we
don't have the information about the number of rows, while for the
incremental sort we have it.

But clearly 9103.09 is not total cost for all 6M rows the incremental
sort is expected to produce (because that has to be higher than 285612,
which is the cost of the index scan). So it seems like the total cost of
the incremental sort is ~546000, because

(100000 / 6000000) * 546000 = 9133

so close to the total cost of the incremental sort. But then

100000.0 / 6000000 * 9133 = 152

so it seems we actually do the linear approximation twice. That seems
pretty bogus, IMO. And indeed, if I remove this part from
cost_incremental_sort:

if (limit_tuples > 0 && limit_tuples < input_tuples)
{
output_tuples = limit_tuples;
output_groups = floor(output_tuples / group_tuples) + 1;
}

then it behaves kinda reasonable:

explain select * from t order by a, b limit 100000;

QUERY PLAN
-----------------------------------------------------------------------------------------
Limit (cost=1.05..9103.12 rows=100000 width=12)
-> Incremental Sort (cost=1.05..546124.41 rows=5999991 width=12)
Sort Key: a, b
Presorted Key: a
-> Index Scan using t_a_idx on t (cost=0.43..285612.24 rows=5999991 width=12)
(5 rows)

set parallel_tuple_cost = 0;
set parallel_setup_cost = 0;

explain select * from t order by a, b limit 100000;

QUERY PLAN
--------------------------------------------------------------------------------------------------------
Limit (cost=0.86..6775.63 rows=100000 width=12)
-> Gather Merge (cost=0.86..406486.25 rows=5999991 width=12)
Workers Planned: 2
-> Incremental Sort (cost=0.84..343937.44 rows=2499996 width=12)
Sort Key: a, b
Presorted Key: a
-> Parallel Index Scan using t_a_idx on t (cost=0.43..250612.29 rows=2499996 width=12)
(7 rows)

But I'm not going to claim those are total fixes, it's the minimum I
needed to do to make this particular type of plan work.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
parallel-incremental-sort.patch text/plain 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-06-25 23:57:56 Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Previous Message Andres Freund 2019-06-25 22:55:43 Re: Don't allocate IndexAmRoutine dynamically?