Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-03-28 14:19:04
Message-ID: CAAaqYe-Jhp-QEC2-pwg61aF0vbGR=gEW657cnXxcqUJvdvBaQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> ...
> >> As a side note here, I'm wondering if this (determining useful pathkeys)
> >> can be made a bit smarter by looking both at query_pathkeys and pathkeys
> >> useful for merging, similarly to what truncate_useless_pathkeys() does.
> >> But that can be seen as an improvement of what we do now.
> >
> >Unless your comment below about looking at truncate_useless_pathkeys
> >is implying you're considering aiming to get this in now, I wonder if
> >we should just expand the comment to reference pathkeys useful for
> >merging as a possible future extension.
> >
>
> Maybe. I've been thinking about how to generate those path keys, but
> it's a bit tricky, because pathkeys_useful_for_merging() - the thing
> that backs truncate_useless_pathkeys() actually gets pathkeys and merely
> verifies if those are useful for merging. But this code needs to do the
> opposite - generate those pathkeys.
>
> But let's say we know we have a join on columns (a,b,c). For each of
> those PathKey values we know it's useful for merging, but should we
> generate pathkeys (a,b,c), (b,c,a), ... or any other permutation? I
> suppose we can look at pathkeys for existing paths of the relation to
> prune the possibilities a bit, but what then?

I'm not convinced it's worth it this time around. Both because of the
late hour in the CF etc., but also because it seems likely to become
pretty complex quickly, and also far more likely to raise performance
questions in planning if there's not a lot of work done to keep it
limited.

> BTW I wonder if we actually need the ec_has_volatile check in
> get_useful_pathkeys_for_relation. The comment says we can't but is it
> really true? pathkeys_useful_for_ordering doesn't do any such checks, so
> I'd bet this is merely an unnecessary copy-paste from postgres_fdw.
> Opinions?

I hadn't really looked at that part in depth before, but after reading
it over, re-reading the definition of volatility, and running a few
basic queries, I agree.

For example: we already do allow volatile pathkeys in a simple query like:
-- t(a, b) with index on (a)
select * from t order by a, random();

It makes sense that you couldn't push down such a sort to a foreign
server, given there's no constraints said function isn't operating
directly on the primary database (in the fdw case). But that obviously
wouldn't apply here.

> >> >9. optimizer/path/allpaths.c get_useful_pathkeys_for_relation:
> >> >* Considering query_pathkeys is always worth it, because it might let us
> >> >* avoid a local sort.
> >> >
> >> >That originally was a copy from the fdw code, but since the two
> >> >functions have diverged (Is that concerning? I could be confusing, but
> >> >isn't a compilation problem) I didn't move the function.
> >> >
> >>
> >> I think it's OK the two functions diverged, it's simply because the FDW
> >> one needs to check other things too. But I might rework this once I look
> >> closer at truncate_useless_pathkeys.
> >
> >Agreed, for now at least. It's tempting to think they should always be
> >shared, but I'm not convinced (without a lot more digging) that this
> >represents structural rather than incidental duplication.
> >
>
> The more I look at pathkeys_useful_for_ordering() the more I think the
> get_useful_pathkeys_for_relation() function should look more like it
> than the postgres_fdw one ...

If we go down that path (and indeed this is actually implied by
removing the volatile check too), doesn't that really just mean (at
least for now) that get_useful_pathkeys_for_relation goes away
entirely and we effectively use root->query_pathkeys directly? The
only thing your lose them is the check that each eclass has a member
in the rel. But that check probably wasn't quite right anyway: at
least for incremental sort (maybe not regular sort), I think we
probably don't necessarily care that the entire list has members in
the rel, but rather that some prefix does, right? The idea there would
be that we could create a gather merge here that supplies a partial
ordering (that prefix of query_pathkeys) and then above it the planner
might place another incremental sort (say above a join), or perhaps
even a join above that would actually be sufficient (since many joins
are capable of providing an ordering anyway).

I've attached (added prefix .txt to avoid the cfbot assuming this is a
full series) an idea in that direction to see if we're thinking along
the same route. You'll want to apply and view with `git diff -w`
probably.

The attached also adds a few XXX comments. In particular, I wonder if
we should verify that some prefix of the root->query_pathkeys is
actually made up of eclass members for the current rel, because
otherwise I think we can skip the loop on the subpaths entirely.

> >> >I did notice though that find_em_expr_for_rel() is wholesale copied
> >> >(and unchanged) from the fdw code, so I moved it to equivclass.c so
> >> >both places can share it.
> >> >
> >>
> >> +1
> >>
>
> ... which would also get rid of find_em_expr_for_rel().

... which, I think, would retain the need for find_em_expr_for_rel().

Note: The attached applied to the previous series compiles and runs
make check...but I haven't really tested it; it's meant more for "is
this the direction we want to go".

James

Attachment Content-Type Size
use_truncate_useless_pathkeys.patch.txt text/plain 9.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-03-28 14:31:51 Re: improve transparency of bitmap-only heap scans
Previous Message Pavel Stehule 2020-03-28 14:09:34 Re: proposal \gcsv