Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-04-01 13:05:27
Message-ID: CAAaqYe953TA0Pj+08jEBgK_=5zMCJO-vgYnh4n_Me71ZdzDJLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 31, 2020 at 11:07 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Tue, Mar 31, 2020 at 10:44 PM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >
> > On Tue, Mar 31, 2020 at 10:12:29PM -0400, James Coleman wrote:
> > >On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra
> > ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > >>
> > >> On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote:
> > >> >On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
> > >> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > >> >>
> > >> >> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
> > >> >> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
> > >> >> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
...
> > >> >> >One small idea, but I'm not yet sure it helps us a whole lot: if the
> > >> >> >query pathkeys is only length 1, then we could skip the additional
> > >> >> >path creation.
> > >> >> >
> > >> >>
> > >> >> I don't follow. Why would we create incremental sort in this case at
> > >> >> all? With single-element query_pathkeys the path is either unsorted or
> > >> >> fully sorted - there's no room for incremental sort. No?
> > >> >
> > >> >Well, we shouldn't, that's what I'm getting. But I didn't see anything
> > >> >in the code now that explicitly excludes that case when decided
> > >> >whether or not to create an incremental sort path, unless I'm missing
> > >> >something obvious.
> > >>
> > >> Well, my point is that create_ordered_paths() looks like this:
> > >>
> > >> is_sorted = pathkeys_common_contained_in(root->sort_patkeys, ...);
> > >>
> > >> if (is_sorted)
> > >> {
> > >> ... old code
> > >> }
> > >> else
> > >> {
> > >> if (input_path == cheapest_input_path)
> > >> {
> > >> ... old code
> > >> }
> > >>
> > >> /* With incremental sort disabled, don't build those paths. */
> > >> if (!enable_incrementalsort)
> > >> continue;
> > >>
> > >> /* Likewise, if the path can't be used for incremental sort. */
> > >> if (!presorted_keys)
> > >> continue;
> > >>
> > >> ... incremental sort path
> > >> }
> > >>
> > >> Now, with single-item sort_pathkeys, the input path can't be partially
> > >> sorted. It's either fully sorted - in which case it's handled by the
> > >> first branch. Or it's not sorted at all, so presorted_keys==0 and we
> > >> never get to the incremental path.
> > >>
> > >> Or did you mean to use the optimization somewhere else?
> > >
> > >Hmm, yes, I didn't think through that properly. I'll have to look at
> > >the other cases to confirm the same logic applies there.

I looked through this more carefully, and I did end up finding a few
places where we can skip iterating through a list of paths entirely
with this check, so I added it there. I also cleaned up some comments,
added comments and asserts to the other places where
list_length(pathkeys) should be guaranteed to be > 1, removed a few
asserts I found unnecessary, and merged duplicative
pathkeys_[count_]_contained_in calls.

> > >One other thing:in the code above we create the regular sort path
> > >inside of `if (input_path == cheapest_input_path)`, but incremental
> > >sort is outside of that condition. I'm not sure I'm remembering why
> > >that was, and it's not obvious to me reading it right now (though it's
> > >getting late here, so maybe I'm just not thinking clearly). Do you
> > >happen to remember why that is?
> > >
> >
> > It's because for the regular sort, the path is either already sorted or
> > it requires a full sort. But full sort only makes sense on the cheapest
> > path, because we assume the additional sort cost is independent of the
> > input cost, essentially
> >
> > cost(path + Sort) = cost(path) + cost(Sort)
> >
> > and it's always
> >
> > cost(path) + cost(Sort) >= cost(cheapest path) + cost(Sort)
> >
> > and by checking for cheapest path we simply skip building all the paths
> > that we'd end up discarding anyway.
> >
> > With incremental sort we can't do this, the cost of the incremental sort
> > depends on how well presorted is the input path.

Thanks for the explanation. I've added a comment to that effect.

James

Attachment Content-Type Size
v50-0004-add-fast-path-to-partial-path-consideration.patch text/x-patch 3.8 KB
v50-0003-comment.patch text/x-patch 1.4 KB
v50-0001-Consider-low-startup-cost-when-adding-partial-pa.patch text/x-patch 4.6 KB
v50-0005-Consider-incremental-sort-paths-in-additional-pl.patch text/x-patch 24.2 KB
v50-0002-Implement-incremental-sort.patch text/x-patch 167.9 KB
v50-0006-cleanup.patch text/x-patch 7.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-04-01 13:08:36 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message Prabhat Sahu 2020-04-01 12:56:13 Re: [Proposal] Global temporary tables