Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-07-20 14:33:02
Message-ID: CAAaqYe-8pxV1UfxbbN7WTXUPhs9g1_swZEomoEp4J0BGpnv_Sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 20, 2019 at 9:22 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Fri, Jul 19, 2019 at 04:59:21PM -0400, James Coleman wrote:
> >On Mon, Jul 8, 2019 at 9:37 PM Tomas Vondra
> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >> Now, consider this example:
> >>
> >> create table t (a int, b int, c int);
> >> insert into t select mod(i,100),mod(i,100),i from generate_series(1,10000000) s(i);
> >> create index on t (a);
> >> analyze t;
> >> explain select a,b,sum(c) from t group by 1,2 order by 1,2,3 limit 1;
> >>
> >> With 0001+0002+0003 pathes, I get a plan like this:
> >>
> >> QUERY PLAN
> >> --------------------------------------------------------------------------------------------------------------------
> >> Limit (cost=10375.39..10594.72 rows=1 width=16)
> >> -> Incremental Sort (cost=10375.39..2203675.71 rows=10000 width=16)
> >> Sort Key: a, b, (sum(c))
> >> Presorted Key: a, b
> >> -> GroupAggregate (cost=10156.07..2203225.71 rows=10000 width=16)
> >> Group Key: a, b
> >> -> Gather Merge (cost=10156.07..2128124.39 rows=10000175 width=12)
> >> Workers Planned: 2
> >> -> Incremental Sort (cost=9156.04..972856.05 rows=4166740 width=12)
> >> Sort Key: a, b
> >> Presorted Key: a
> >> -> Parallel Index Scan using t_a_idx on t (cost=0.43..417690.30 rows=4166740 width=12)
> >> (12 rows)
> >>
> >>
> >> and with 0004, I get this:
> >>
> >> QUERY PLAN
> >> ------------------------------------------------------------------------------------------------------
> >> Limit (cost=20443.84..20665.32 rows=1 width=16)
> >> -> Incremental Sort (cost=20443.84..2235250.05 rows=10000 width=16)
> >> Sort Key: a, b, (sum(c))
> >> Presorted Key: a, b
> >> -> GroupAggregate (cost=20222.37..2234800.05 rows=10000 width=16)
> >> Group Key: a, b
> >> -> Incremental Sort (cost=20222.37..2159698.74 rows=10000175 width=12)
> >> Sort Key: a, b
> >> Presorted Key: a
> >> -> Index Scan using t_a_idx on t (cost=0.43..476024.65 rows=10000175 width=12)
> >> (10 rows)
> >>
> >> Notice that cost of the second plan is almost double the first one. That
> >> means 0004 does not even generate the first plan, i.e. there are cases
> >> where we don't try to add the explicit sort before passing the path to
> >> generate_gather_paths().
> >>
> >> And I think I know why is that - while gather_grouping_paths() tries to
> >> add explicit sort below the gather merge, there are other places that
> >> call generate_gather_paths() that don't do that. In this case it's
> >> probably apply_scanjoin_target_to_paths() which simply builds
> >>
> >> parallel (seq|index) scan + gather merge
> >>
> >> and that's it. The problem is likely the same - the code does not know
> >> which pathkeys are "interesting" at that point. We probably need to
> >> teach planner to do this.
> >
> >I've been working on figuring out sample queries for each of the
> >places we're looking at adding create_increment_sort() (starting with
> >the cases enabled by gather-merge nodes). The
> >generate_useful_gather_paths() call in
> >apply_scanjoin_target_to_paths() is required to generate the above
> >preferred plan.

As I continue this, I've added a couple of test cases (notably for
generate_useful_gather_paths() in both standard_join_search() and
apply_scanjoin_target_to_paths()). Those, plus the current WIP state
of my hacking on your patch adding generate_useful_gather_paths() is
attached as 0001-parallel-and-more-paths.patch.

My current line of investigation is whether we need to do anything in
the parallel portion of create_ordered_paths(). I noticed that the
first-pass patch adding generate_useful_gather_paths() modified that
section but wasn't actually adding any new gather-merge paths (just
bare incremental sort paths). That seems pretty clearly just a
prototype miss, so I modified the prototype to build gather-merge
paths instead (as a side note that change seems to fix an oddity I was
seeing where plans would include a parallel index scan node even
though they weren't parallel plans). While the resulting plan for
something like:

explain analyze select * from t where t.a in (1,2,3,4,5,6) order by
t.a, t.b limit 50;

changes cost (to be cheaper) ever so slightly with the gather-merge
addition to create_ordered_paths(), the plan itself is otherwise
identical (including row estimates):

Limit
-> Gather Merge
-> Incremental Sort
-> Parallel Index Scan

(Note: I'm forcing parallel plans here with: set
max_parallel_workers_per_gather=4; set min_parallel_table_scan_size=0;
set parallel_tuple_cost=0; set parallel_setup_cost=0; set
min_parallel_index_scan_size=0;)

I can't seem to come up with a case where adding these gather-merge
paths in create_ordered_paths() isn't entirely duplicative of paths
already created by generate_useful_gather_paths() as called from
apply_scanjoin_target_to_paths() -- which I _think_ makes sense given
that both apply_scanjoin_target_to_paths() and create_ordered_paths()
are called by grouping_planner().

Can you think of a case I'm missing here that would make it valuable
to generate new parallel plans in create_ordered_paths()?

> >But I found that if I set enable_sort=off (with our without the
> >_useful_ variant of generate_gather_paths()) I get a very similar plan
> >that's actually lower cost yet:
> >
> > QUERY PLAN
> >--------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=10255.98..10355.77 rows=1 width=16)
> > -> Incremental Sort (cost=10255.98..1008228.67 rows=10000 width=16)
> > Sort Key: a, b, (sum(c))
> > Presorted Key: a, b
> > -> Finalize GroupAggregate (cost=10156.20..1007778.67
> >rows=10000 width=16)
> > Group Key: a, b
> > -> Gather Merge (cost=10156.20..1007528.67 rows=20000 width=16)
> > Workers Planned: 2
> > -> Partial GroupAggregate
> >(cost=9156.18..1004220.15 rows=10000 width=16)
> > Group Key: a, b
> > -> Incremental Sort
> >(cost=9156.18..972869.60 rows=4166740 width=12)
> > Sort Key: a, b
> > Presorted Key: a
> > -> Parallel Index Scan using t_a_idx
> >on t (cost=0.43..417703.85 rows=4166740 width=12)
> >
> >Is that something we should consider a bug at this stage? It's also
> >not clear to mean (costing aside) which plan is intuitively
> >preferable.
> >
>
> This seems like a thinko in add_partial_path() - it only looks at total
> cost of the paths, and ignores the startup cost entirely. I've debugged
> it a bit, and what's happening for the partially-grouped relation is
> roughly this:
>
> 1) We add a partial path with startup/total costs 696263 / 738029
>
> 2) We attempt to add the "Partial GroupAggregate" path, but it loses the
> fight because the total cost (1004207) is higher than the first path.
> Which entirely misses that the startup cost is way lower.
>
> 3) We however use the startup cost later when computing the LIMIT cost
> (because that's linear approximation between startup and total cost),
> and we reject the first path too, because we happen to find something
> cheaper (but more expensive than what we'd get the path rejected in 2).
>
> Attached is a debug patch which makes this clear - it only prints info
> about first step of partial aggregates, because that's what matters in
> this example. You should see something like this:
>
> WARNING: rel 0x2aa8e00 adding partial agg path 0x2aa9448 startup 696263.839993 total 738029.919993
> WARNING: rel 0x2aa8e00 path 0x2aa9448 adding new = 1
> WARNING: rel 0x2aa8e00 adding partial agg path 0x2aa9710 startup 9156.084995 total 1004207.854495
> WARNING: A: new path 0x2aa9710 rejected because of 0x2aa9448
> WARNING: rel 0x2aa8e00 path 0x2aa9710 adding new = 0
>
> which essentially says "path rejected because of total cost" (and you
> know it's the interesting partial aggregate from the second plan). And
> if you disable sort, you get this:
>
> WARNING: rel 0x2aa8e00 adding partial agg path 0x2aa9448 startup 10000696263.839994 total 10000738029.919996
> WARNING: rel 0x2aa8e00 path 0x2aa9448 adding new = 1
> WARNING: rel 0x2aa8e00 adding partial agg path 0x2aa9710 startup 9156.084995 total 1004207.854495
> WARNING: rel 0x2aa8e00 path 0x2aa9710 adding new = 1
> ...
>
> So in this case we decided the path is interesting, thanks to the
> increased cost of sort.
>
> The comment for add_partial_path says this:
>
> * Neither do we need to consider startup costs: parallelism is only
> * used for plans that will be run to completion. Therefore, this
> * routine is much simpler than add_path: it needs to consider only
> * pathkeys and total cost.
>
> I think this may be a thinko, as this plan demonstrates - but I'm not
> sure about it. I wonder if this might be penalizing some other types of
> plans (essentially anything with limit + gather).
>
> Attached is a WIP patch fixing this by considering both startup and
> total cost (by calling compare_path_costs_fuzzily).

It seems to me that this is likely a bug, and not just a changed
needed for this. Do you think it's better addressed in a separate
thread? Or retain it as part of this patch for now (and possibly break
it out later)? On the other hand, it's entirely possible that someone
more familiar with parallel plan limitations could explain why the
above comment holds true. That makes me lean towards asking in a new
thread.

I've also attached a new base patch (incremental-sort-30.patch) which
includes some of the other obvious fixes (costing, etc.) that you'd
previously proposed.

James Coleman

Attachment Content-Type Size
incremental-sort-30.patch application/octet-stream 135.9 KB
0001-parallel-and-more-paths.patch application/octet-stream 23.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-07-20 14:38:16 Re: [sqlsmith] Crash in mcv_get_match_bitmap
Previous Message JVM . 2019-07-20 13:56:07 Queries on QMF to POSTGRE