Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-07-20 16:21:01
Message-ID: CAAaqYe9VPucfA80sazxaUCsVovcfv1CYTddzB14yHJWcSe7wrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 20, 2019 at 12:12 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Sat, Jul 20, 2019 at 11:25 AM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> ...
> > >My current line of investigation is whether we need to do anything in
> > >the parallel portion of create_ordered_paths(). I noticed that the
> > >first-pass patch adding generate_useful_gather_paths() modified that
> > >section but wasn't actually adding any new gather-merge paths (just
> > >bare incremental sort paths). That seems pretty clearly just a
> > >prototype miss, so I modified the prototype to build gather-merge
> > >paths instead (as a side note that change seems to fix an oddity I was
> > >seeing where plans would include a parallel index scan node even
> > >though they weren't parallel plans). While the resulting plan for
> > >something like:
> > >
> >
> > Yes, that seems to be a bug. The block above it clealy has a gather
> > merge nodes, so this one should too.
> >
> > >explain analyze select * from t where t.a in (1,2,3,4,5,6) order by
> > >t.a, t.b limit 50;
> > >
> > >changes cost (to be cheaper) ever so slightly with the gather-merge
> > >addition to create_ordered_paths(), the plan itself is otherwise
> > >identical (including row estimates):
> > >
> > >Limit
> > > -> Gather Merge
> > > -> Incremental Sort
> > > -> Parallel Index Scan
> > >
> > >(Note: I'm forcing parallel plans here with: set
> > >max_parallel_workers_per_gather=4; set min_parallel_table_scan_size=0;
> > >set parallel_tuple_cost=0; set parallel_setup_cost=0; set
> > >min_parallel_index_scan_size=0;)
> > >
> > >I can't seem to come up with a case where adding these gather-merge
> > >paths in create_ordered_paths() isn't entirely duplicative of paths
> > >already created by generate_useful_gather_paths() as called from
> > >apply_scanjoin_target_to_paths() -- which I _think_ makes sense given
> > >that both apply_scanjoin_target_to_paths() and create_ordered_paths()
> > >are called by grouping_planner().
> > >
> > >Can you think of a case I'm missing here that would make it valuable
> > >to generate new parallel plans in create_ordered_paths()?
> > >
> >
> > Good question. Not sure. I think such path would need to do something on
> > a relation that is neither a join nor a scan - in which case the path
> > should not be created by apply_scanjoin_target_to_paths().
> >
> > So for example a query like this:
> >
> > SELECT
> > a, b, sum(expensive_function(c))
> > FROM
> > t
> > GROUP BY a,b
> > ORDER BY a,sum(...) LIMIT 10;
> >
> > should be able to produce a plan like this:
> >
> > -> Limit
> > -> Gather Merge
> > -> Incremental Sort (pathkeys: a, sum)
> > -> Group Aggregate
> > a, b, sum(expensive_function(c))
> > -> Index Scan (pathkeys: a, b)
> >
> > or something like that, maybe. I haven't tried this, though. The other
> > question is whether such queries are useful in practice ...
>
> Hmm, when I step through on that example input_rel->partial_pathlist
> != NIL is false, so we don't even attempt to consider any extra
> parallel paths in create_ordered_paths(). Nonetheless we get a
> parallel plan, but with a different shape:
>
> Limit
> -> Incremental Sort
> Sort Key: a, (sum(expensive_function(c)))
> Presorted Key: a
> -> Finalize GroupAggregate
> Group Key: a, b
> -> Gather Merge
> Workers Planned: 4
> -> Partial GroupAggregate
> Group Key: a, b
> -> Sort
> Sort Key: a, b
> -> Parallel Seq Scan on t
>
> (or if I disable seqscan then the sort+seq scan above becomes inc sort
> + index scan)
>
> To be honest, I don't think I understand how you would get a plan like
> that anyway since the index here only has "a" and so can't provide
> (pathkeys: a, b).
>
> Perhaps there's something I'm still missing though.

Also just realized I don't think (?) we can order by the sum inside a
gather-merge -- at least not without having another sort above the
parallel portion? Or is the group aggregate able to also provide
ordering on the final sum after aggregating the partial sums?

James Coleman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergei Kornilov 2019-07-20 16:44:53 Re: [HACKERS] Block level parallel vacuum
Previous Message James Coleman 2019-07-20 16:12:34 Re: [PATCH] Incremental sort (was: PoC: Partial sort)