Re: [PATCH] Incremental sort

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort
Date: 2017-09-13 23:48:42
Message-ID: CAPpHfdtvx1oVJHCUe1x-R=t55Uz4uBi9i55BcoTJ7zZN_7Gxfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 8, 2017 at 6:51 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, May 5, 2017 at 11:13 AM, Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > Incremental sort is faster in vast majority of cases. It appears to be
> > slower only when whose dataset is one sort group. In this case
> incremental
> > sort is useless, and it should be considered as misuse of incremental
> sort.
> > Slowdown is related to the fact that we anyway have to do extra
> comparisons,
> > unless we somehow push our comparison result into qsort itself and save
> some
> > cpu cycles (but that would be unreasonable break of encapsulation).
> Thus,
> > in such cases regression seems to be inevitable anyway. I think we could
> > evade this regression during query planning. If we see that there would
> be
> > only few groups, we should choose plain sort instead of incremental sort.
>
> I'm sorry that I don't have time to review this in detail right now,
> but it sounds like you are doing good work to file down cases where
> this might cause regressions, which is great.

Thank you for paying attention to this patch!

> Regarding the point in
> the paragraph above, I'd say that it's OK for the planner to be
> responsible for picking between Sort and Incremental Sort in some way.
> It is, after all, the planner's job to decide between different
> strategies for executing the same query and, of course, sometimes it
> will be wrong, but that's OK as long as it's not wrong too often (or
> by too much, hopefully).

Right, I agree.

> It may be a little difficult to get this
> right, though, because I'm not sure that the information you need
> actually exists (or is reliable). For example, consider the case
> where we need to sort 100m rows and there are 2 groups. If 1 group
> contains 1 row and the other group contains all of the rest, there is
> really no point in an incremental sort. On the other hand, if each
> group contains 50m rows and we can get the data presorted by the
> grouping column, there might be a lot of point to an incremental sort,
> because two 50m-row sorts might be a lot cheaper than one 100m sort.
>
More generally, it's quite easy to imagine situations where the
> individual groups can be quicksorted but sorting all of the rows
> requires I/O, even when the number of groups isn't that big. On the
> other hand, the real sweet spot for this is probably the case where
> the number of groups is very large, with many single-row groups or
> many groups with just a few rows each, so if we can at least get this
> to work in those cases that may be good enough. On the third hand,
> when costing aggregation, I think we often underestimate the number of
> groups and there might well be similar problems here.

I agree with that. I need to test this patch more carefully in the case
when groups have different sizes. It's likely I need to add yet another
parameter to my testing script: groups count skew.

Patch rebased to current master is attached. I'm going to improve my
testing script and post new results.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
incremental-sort-8.patch application/octet-stream 119.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-09-13 23:57:32 Re: Parallel Hash take II
Previous Message Tom Lane 2017-09-13 23:48:29 Re: psql: new help related to variables are not too readable