Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-09-28 00:31:30
Message-ID: CAAaqYe9K1LRRMaVpMwPiv3dwZu9i_0JD7tpLfgNHqUJMkEnU7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 13, 2019 at 10:51 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Thu, Sep 12, 2019 at 12:49:29PM -0300, Alvaro Herrera wrote:
> >On 2019-Jul-30, Tomas Vondra wrote:
> >> I've decided to do a couple of experiments, trying to make my mind about
> >> which modified places matter to diffrent queries. But instead of trying
> >> to reverse engineer the queries, I've taken a different approach - I've
> >> compiled a list of queries that I think are sensible and relevant, and
> >> then planned them with incremental sort enabled in different places.
> >
> >[...]
> >
> >> The list of queries (synthetic, but hopefully sufficiently realistic)
> >> and a couple of scripts to collect the plans is in this repository:
> >>
> >> https://github.com/tvondra/incremental-sort-tests-2
> >>
> >> There's also a spreadsheet with a summary of results, with a visual
> >> representation of which GUCs affect which queries.
> >
> >OK, so we have that now. I suppose this spreadsheet now tells us which
> >places are useful and which aren't, at least for the queries that you've
> >tested. Dowe that mean that we want to get the patch to consider adding
> >paths only the places that your spreadsheet says are useful? I'm not
> >sure what the next steps are for this patch.
> >
>
> Yes. I think the spreadsheet call help us with answering two things:
>
> 1) places actually affecting the plan (all but three do)
>
> 2) redundant places (there are some cases where two GUCs produce the
> same plan in the end)

To expand on this further, (1) should probably help us to be able to
write test cases.

Additionally, one big thing we still need that's somewhat external to
the patch is a good way to benchmark/a set of queries that we believe
are representative enough to be good benchmarks.

I'd really appreciate some input from you all on that particular
question; I feel like it's in some sense the biggest barrier to
getting the patch merged, but also the part where long experience in
the community/exposure to other use cases will probably be quite
valuable.

> Of course, this does assume the query set makes sense and is somewhat
> realistic, but I've tried to construct queries where that is true. We
> may extend it over time, of course.
>
> I think we've agreed to add incremental sort paths different places in
> separate patches, to make review easier. So this may be a useful way to
> decide which places to address first. I'd probably do it in this order:
>
> - create_ordered_paths
> - create_ordered_paths (parallel part)
> - add_paths_to_grouping_rel
> - ... not sure ...
>
> but that's just a proposal. It'd give us most of the benefits, I think,
> and we could also focus on the rest of the patch.

Certainly the first two seem like pretty obvious most necessary base
cases. I think supporting group bys also seems like a pretty standard
case, so at first glance I'd say this seems like a reasonable course
to me.

I'm going to start breaking up the patches in this thread into a
series in support of that. Since I've started a new thread with the
add_partial_path change, I'll include that patch here as part of this
series also. Do you think it's worth moving the tuplesort changes into
a standalone patch in the series also?

Attached is a rebased v31 now broken into the following:

- 001-consider-startup-cost-in-add-partial-path_v1.patch: From the
other thread (Tomas's patch unmodified)
- 002-incremental-sort_v31.patch: Updated base incremental sort patch

Besides rebasing, I've changed the enable_incrementalsort GUC to
prevent generating paths entirely rather than being cost-based, since
incremental sort is never absolutely necessary in the way regular sort
is.

I'm hoping to add 003 soon with the initial parallel parts, but I'm
about out of time right now and wanted to get something out, so
sending this without that.

Side question: for the patch tester do I have to attach each part of
the series each time even if nothing's changed in several of them? And
does the vN number at the end need to stay the same for all of them?
My attachments to this email don't follow that... Also, since this
email changes patch naming, so I need to do anything to clear out the
old ones? (I suppose if not, then that would imply an answer to the
first question also.)

James

Attachment Content-Type Size
001-consider-startup-cost-in-add-partial-path_v1.patch application/octet-stream 3.1 KB
002-incremental-sort_v31.patch application/octet-stream 135.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Glukhov 2019-09-28 01:42:16 Re: SQL/JSON: functions
Previous Message Jeff Davis 2019-09-28 00:07:39 max_parallel_workers question