Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-07-20 13:22:44
Message-ID: 20190720132244.3vgg2uynfpxh3me5@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 19, 2019 at 04:59:21PM -0400, James Coleman wrote:
>On Mon, Jul 8, 2019 at 9:37 PM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Now, consider this example:
>>
>> create table t (a int, b int, c int);
>> insert into t select mod(i,100),mod(i,100),i from generate_series(1,10000000) s(i);
>> create index on t (a);
>> analyze t;
>> explain select a,b,sum(c) from t group by 1,2 order by 1,2,3 limit 1;
>>
>> With 0001+0002+0003 pathes, I get a plan like this:
>>
>> QUERY PLAN
>> --------------------------------------------------------------------------------------------------------------------
>> Limit (cost=10375.39..10594.72 rows=1 width=16)
>> -> Incremental Sort (cost=10375.39..2203675.71 rows=10000 width=16)
>> Sort Key: a, b, (sum(c))
>> Presorted Key: a, b
>> -> GroupAggregate (cost=10156.07..2203225.71 rows=10000 width=16)
>> Group Key: a, b
>> -> Gather Merge (cost=10156.07..2128124.39 rows=10000175 width=12)
>> Workers Planned: 2
>> -> Incremental Sort (cost=9156.04..972856.05 rows=4166740 width=12)
>> Sort Key: a, b
>> Presorted Key: a
>> -> Parallel Index Scan using t_a_idx on t (cost=0.43..417690.30 rows=4166740 width=12)
>> (12 rows)
>>
>>
>> and with 0004, I get this:
>>
>> QUERY PLAN
>> ------------------------------------------------------------------------------------------------------
>> Limit (cost=20443.84..20665.32 rows=1 width=16)
>> -> Incremental Sort (cost=20443.84..2235250.05 rows=10000 width=16)
>> Sort Key: a, b, (sum(c))
>> Presorted Key: a, b
>> -> GroupAggregate (cost=20222.37..2234800.05 rows=10000 width=16)
>> Group Key: a, b
>> -> Incremental Sort (cost=20222.37..2159698.74 rows=10000175 width=12)
>> Sort Key: a, b
>> Presorted Key: a
>> -> Index Scan using t_a_idx on t (cost=0.43..476024.65 rows=10000175 width=12)
>> (10 rows)
>>
>> Notice that cost of the second plan is almost double the first one. That
>> means 0004 does not even generate the first plan, i.e. there are cases
>> where we don't try to add the explicit sort before passing the path to
>> generate_gather_paths().
>>
>> And I think I know why is that - while gather_grouping_paths() tries to
>> add explicit sort below the gather merge, there are other places that
>> call generate_gather_paths() that don't do that. In this case it's
>> probably apply_scanjoin_target_to_paths() which simply builds
>>
>> parallel (seq|index) scan + gather merge
>>
>> and that's it. The problem is likely the same - the code does not know
>> which pathkeys are "interesting" at that point. We probably need to
>> teach planner to do this.
>
>I've been working on figuring out sample queries for each of the
>places we're looking at adding create_increment_sort() (starting with
>the cases enabled by gather-merge nodes). The
>generate_useful_gather_paths() call in
>apply_scanjoin_target_to_paths() is required to generate the above
>preferred plan.
>
>But I found that if I set enable_sort=off (with our without the
>_useful_ variant of generate_gather_paths()) I get a very similar plan
>that's actually lower cost yet:
>
> QUERY PLAN
>--------------------------------------------------------------------------------------------------------------------------
> Limit (cost=10255.98..10355.77 rows=1 width=16)
> -> Incremental Sort (cost=10255.98..1008228.67 rows=10000 width=16)
> Sort Key: a, b, (sum(c))
> Presorted Key: a, b
> -> Finalize GroupAggregate (cost=10156.20..1007778.67
>rows=10000 width=16)
> Group Key: a, b
> -> Gather Merge (cost=10156.20..1007528.67 rows=20000 width=16)
> Workers Planned: 2
> -> Partial GroupAggregate
>(cost=9156.18..1004220.15 rows=10000 width=16)
> Group Key: a, b
> -> Incremental Sort
>(cost=9156.18..972869.60 rows=4166740 width=12)
> Sort Key: a, b
> Presorted Key: a
> -> Parallel Index Scan using t_a_idx
>on t (cost=0.43..417703.85 rows=4166740 width=12)
>
>Is that something we should consider a bug at this stage? It's also
>not clear to mean (costing aside) which plan is intuitively
>preferable.
>

This seems like a thinko in add_partial_path() - it only looks at total
cost of the paths, and ignores the startup cost entirely. I've debugged
it a bit, and what's happening for the partially-grouped relation is
roughly this:

1) We add a partial path with startup/total costs 696263 / 738029

2) We attempt to add the "Partial GroupAggregate" path, but it loses the
fight because the total cost (1004207) is higher than the first path.
Which entirely misses that the startup cost is way lower.

3) We however use the startup cost later when computing the LIMIT cost
(because that's linear approximation between startup and total cost),
and we reject the first path too, because we happen to find something
cheaper (but more expensive than what we'd get the path rejected in 2).

Attached is a debug patch which makes this clear - it only prints info
about first step of partial aggregates, because that's what matters in
this example. You should see something like this:

WARNING: rel 0x2aa8e00 adding partial agg path 0x2aa9448 startup 696263.839993 total 738029.919993
WARNING: rel 0x2aa8e00 path 0x2aa9448 adding new = 1
WARNING: rel 0x2aa8e00 adding partial agg path 0x2aa9710 startup 9156.084995 total 1004207.854495
WARNING: A: new path 0x2aa9710 rejected because of 0x2aa9448
WARNING: rel 0x2aa8e00 path 0x2aa9710 adding new = 0

which essentially says "path rejected because of total cost" (and you
know it's the interesting partial aggregate from the second plan). And
if you disable sort, you get this:

WARNING: rel 0x2aa8e00 adding partial agg path 0x2aa9448 startup 10000696263.839994 total 10000738029.919996
WARNING: rel 0x2aa8e00 path 0x2aa9448 adding new = 1
WARNING: rel 0x2aa8e00 adding partial agg path 0x2aa9710 startup 9156.084995 total 1004207.854495
WARNING: rel 0x2aa8e00 path 0x2aa9710 adding new = 1
...

So in this case we decided the path is interesting, thanks to the
increased cost of sort.

The comment for add_partial_path says this:

* Neither do we need to consider startup costs: parallelism is only
* used for plans that will be run to completion. Therefore, this
* routine is much simpler than add_path: it needs to consider only
* pathkeys and total cost.

I think this may be a thinko, as this plan demonstrates - but I'm not
sure about it. I wonder if this might be penalizing some other types of
plans (essentially anything with limit + gather).

Attached is a WIP patch fixing this by considering both startup and
total cost (by calling compare_path_costs_fuzzily).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
incremental-sort-cost-debug.patch text/plain 2.5 KB
add_partial_path-fix.patch text/plain 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message JVM . 2019-07-20 13:56:07 Queries on QMF to POSTGRE
Previous Message Michail Nikolaev 2019-07-20 12:35:57 thoughts on "prevent wraparound" vacuum