Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-04-02 12:41:11
Message-ID: CAAaqYe90h3F5NbgtNJAMrXAiGZwML8=_E_qs4MCyYtqea+5Hhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 1, 2020 at 10:47 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Wed, Apr 01, 2020 at 10:09:20PM -0400, James Coleman wrote:
> >On Wed, Apr 1, 2020 at 5:42 PM Tomas Vondra
> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >> ...
> >> I've realized the way get_useful_pathkeys_for_relation() is coded kinda
> >> works against the fastpath we added when comparing pathkeys. That
> >> depends on comparing pointers to the list, but we've been building new
> >> lists (and then returned those) which defeats the optimization. Attached
> >> is a patch that returns the original list in most cases (and only
> >> creates a copy when really necessary). This might also save a few cycles
> >> on bulding the new list, of course.
> >>
> >> I've done a bunch of read-only pgbench tests with fairly small scales (1
> >> and 10). First with the built-in read-only transaction, and also with a
> >> simple custom query doing an order-by. And I did this both on the
> >> default schema and with a bunch of extra indexes. The script I used to
> >> run this is attached, along with a summary of results.
> >>
> >> There are results for master and v40 and v50 patches (the v50 also
> >> includes the extra patch fixing get_useful_pathkeys_for_relation).
> >>
> >> Overall, I'm happy with those results - the v50 seems to be within 1% of
> >> master, in both directions. This very much seems like a noise.
> >>
> >> I still want to do a bit more review of the costing / tuplesort changes,
> >> which I plan to do tomorrow. If that goes well, I plan to start
> >> committing this. So please if you think this is not ready or wants more
> >
> >I think we need to either implement this or remove the comment:
> >* XXX I wonder if we need to consider adding a projection here, as
> >* create_ordered_paths does.
> >in generate_useful_gather_paths().
> >
>
> Yeah. I think we don't need the projection here. My reasoning is that if
> we don't need it in generate_gather_paths(), we don't need it here.

All right, then I'm removing the comment in the attached series.

> >In the same function we have the following code:
> >/*
> > * When the partial path is already sorted, we can just add a gather
> > * merge on top, and we're done - no point in adding explicit sort.
> > *
> > * XXX Can't we skip this (maybe only for the cheapest partial path)
> > * when the path is already sorted? Then it's likely duplicate with
> > * the path created by generate_gather_paths.
> > */
> >if (is_sorted)
> >{
> > path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
> >
> > subpath->pathkeys, NULL, rowsp);
> >
> > add_path(rel, &path->path);
> > continue;
> >}
> >
> >looking at the relevant loop in generate_gather_paths:
> >/*
> > * For each useful ordering, we can consider an order-preserving Gather
> > * Merge.
> > */
> >foreach(lc, rel->partial_pathlist)
> >{
> > Path *subpath = (Path *) lfirst(lc);
> > GatherMergePath *path;
> >
> > if (subpath->pathkeys == NIL)
> > continue;
> >
> > rows = subpath->rows * subpath->parallel_workers;
> > path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
> >
> > subpath->pathkeys, NULL, rowsp);
> > add_path(rel, &path->path);
> >}
> >
> >I believe we can eliminate the block entirely in
> >generate_useful_gather_paths(). Here's my reasoning: all paths for
> >which is_sorted is true must necessarily have pathkeys, and since we
> >already add a gather merge for every subpath with pathkeys, we've
> >already added gather merge paths for all of these.
> >
> >I've included a patch to change this, but let me know if the reasoning
> >isn't sound.
> >
>
> Good catch! I think you're correct - we don't need to generate this
> path, and we can just skip that partial path entirely.

The attached patch series merges the above fixes into the main patches.

> >We can also remove the XXX on this comment (in the same function):
> >* XXX This is not redundant with the gather merge path created in
> >* generate_gather_paths, because that merely preserves ordering of
> >* the cheapest partial path, while here we add an explicit sort to
> >* get match the useful ordering.
> >
> >because of this code in generate_gather_paths():
> >cheapest_partial_path = linitial(rel->partial_pathlist);
> >rows =
> > cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;
> >simple_gather_path = (Path *)
> > create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
> > NULL, rowsp);
> >add_path(rel, simple_gather_path);
> >
> >but we can cleanup the comment a bit: fix the grammar issue in the
> >last line and fix the reference to gather merge path (it's a gather
> >path).
> >
> >I've included that in the same patch.
> >
>
> OK, makes sense.
>
> >I also noticed that in create_incremental_sort_path we have this:
> >/* XXX comparison_cost shouldn't be 0? */
> >but I guess that's part of what you're reviewing tomorrow.
> >
>
> Right, one of the bits.
>
> >> time for a review, let me know. I'm not yet sure if I'll commit this as
> >> a single change, or in three separate commits.
> >
> >I don't love the idea of committing it as a single patch, but at least
> >the first two I think probably go together. Otherwise we're
> >introducing a "fix" with no proven impact that will slow down planning
> >(even if only in a small way) only to intend to condition that on a
> >GUC in the next commit.
> >
> >But I think you could potentially make an argument for keeping the
> >additional paths separate...but it's not absolutely necessary IMO.
> >
>
> OK. I've been actually wondering whether to move the add_partial_path
> after the main patch, for exactly this reason.
>
> >> James, can you review the proposed extra fix and merge the fixes into
> >> the main patches?
> >
> >I've reviewed it, and it looks correct, so merged into the main series.
> >
> >Summary:
> >The attached series includes a couple of XXX fixes or comment cleanup
> >as noted above. I believe there are two more XXXs that needs to be
> >answered before we merge ("do we need to consider adding a projection"
> >and "what is the comparison cost for incremental sort").
> >
>
> Thanks!

One other thing: the only "real" XXX I see left is in create_ordered_paths():
* XXX This is probably duplicate with the paths we already generate
* in generate_useful_gather_paths in apply_scanjoin_target_to_paths.

It's not as big of a deal as the others (e.g., projection one could
have been a bug), and it's a lot harder for me to determine if it's
actually duplicative. So we could just leave it in, we could just
remove it, or, perhaps you have some thoughts on determining if it's
true or not.

Thanks!
James

Attachment Content-Type Size
v52-0003-Consider-incremental-sort-paths-in-additional-pl.patch text/x-patch 25.9 KB
v52-0001-Consider-low-startup-cost-when-adding-partial-pa.patch text/x-patch 4.6 KB
v52-0002-Implement-incremental-sort.patch text/x-patch 171.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jürgen Purtz 2020-04-02 12:44:26 Re: Add A Glossary
Previous Message Tomas Vondra 2020-04-02 12:29:33 Re: SLRU statistics