|From:||Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>|
|To:||Robert Haas <robertmhaas(at)gmail(dot)com>|
|Cc:||Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: [PATCH] Incremental sort|
|Views:||Raw Message | Whole Thread | Download mbox|
On Mon, May 8, 2017 at 6:51 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, May 5, 2017 at 11:13 AM, Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > Incremental sort is faster in vast majority of cases. It appears to be
> > slower only when whose dataset is one sort group. In this case
> > sort is useless, and it should be considered as misuse of incremental
> > Slowdown is related to the fact that we anyway have to do extra
> > unless we somehow push our comparison result into qsort itself and save
> > cpu cycles (but that would be unreasonable break of encapsulation).
> > in such cases regression seems to be inevitable anyway. I think we could
> > evade this regression during query planning. If we see that there would
> > only few groups, we should choose plain sort instead of incremental sort.
> I'm sorry that I don't have time to review this in detail right now,
> but it sounds like you are doing good work to file down cases where
> this might cause regressions, which is great.
Thank you for paying attention to this patch!
> Regarding the point in
> the paragraph above, I'd say that it's OK for the planner to be
> responsible for picking between Sort and Incremental Sort in some way.
> It is, after all, the planner's job to decide between different
> strategies for executing the same query and, of course, sometimes it
> will be wrong, but that's OK as long as it's not wrong too often (or
> by too much, hopefully).
Right, I agree.
> It may be a little difficult to get this
> right, though, because I'm not sure that the information you need
> actually exists (or is reliable). For example, consider the case
> where we need to sort 100m rows and there are 2 groups. If 1 group
> contains 1 row and the other group contains all of the rest, there is
> really no point in an incremental sort. On the other hand, if each
> group contains 50m rows and we can get the data presorted by the
> grouping column, there might be a lot of point to an incremental sort,
> because two 50m-row sorts might be a lot cheaper than one 100m sort.
More generally, it's quite easy to imagine situations where the
> individual groups can be quicksorted but sorting all of the rows
> requires I/O, even when the number of groups isn't that big. On the
> other hand, the real sweet spot for this is probably the case where
> the number of groups is very large, with many single-row groups or
> many groups with just a few rows each, so if we can at least get this
> to work in those cases that may be good enough. On the third hand,
> when costing aggregation, I think we often underestimate the number of
> groups and there might well be similar problems here.
I agree with that. I need to test this patch more carefully in the case
when groups have different sizes. It's likely I need to add yet another
parameter to my testing script: groups count skew.
Patch rebased to current master is attached. I'm going to improve my
testing script and post new results.
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
|Next Message||Thomas Munro||2017-09-13 23:57:32||Re: Parallel Hash take II|
|Previous Message||Tom Lane||2017-09-13 23:48:29||Re: psql: new help related to variables are not too readable|