Re: [PATCH] Incremental sort

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort
Date: 2017-05-08 15:51:24
Message-ID: CA+TgmoacfEti95LqcF31n74Z8jaGP0+iPFwLT++t8B0+YTxkvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 5, 2017 at 11:13 AM, Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> Incremental sort is faster in vast majority of cases. It appears to be
> slower only when whose dataset is one sort group. In this case incremental
> sort is useless, and it should be considered as misuse of incremental sort.
> Slowdown is related to the fact that we anyway have to do extra comparisons,
> unless we somehow push our comparison result into qsort itself and save some
> cpu cycles (but that would be unreasonable break of encapsulation). Thus,
> in such cases regression seems to be inevitable anyway. I think we could
> evade this regression during query planning. If we see that there would be
> only few groups, we should choose plain sort instead of incremental sort.

I'm sorry that I don't have time to review this in detail right now,
but it sounds like you are doing good work to file down cases where
this might cause regressions, which is great. Regarding the point in
the paragraph above, I'd say that it's OK for the planner to be
responsible for picking between Sort and Incremental Sort in some way.
It is, after all, the planner's job to decide between different
strategies for executing the same query and, of course, sometimes it
will be wrong, but that's OK as long as it's not wrong too often (or
by too much, hopefully). It may be a little difficult to get this
right, though, because I'm not sure that the information you need
actually exists (or is reliable). For example, consider the case
where we need to sort 100m rows and there are 2 groups. If 1 group
contains 1 row and the other group contains all of the rest, there is
really no point in an incremental sort. On the other hand, if each
group contains 50m rows and we can get the data presorted by the
grouping column, there might be a lot of point to an incremental sort,
because two 50m-row sorts might be a lot cheaper than one 100m sort.
More generally, it's quite easy to imagine situations where the
individual groups can be quicksorted but sorting all of the rows
requires I/O, even when the number of groups isn't that big. On the
other hand, the real sweet spot for this is probably the case where
the number of groups is very large, with many single-row groups or
many groups with just a few rows each, so if we can at least get this
to work in those cases that may be good enough. On the third hand,
when costing aggregation, I think we often underestimate the number of
groups and there might well be similar problems here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-05-08 15:52:29 Re: Logical replication - TRAP: FailedAssertion in pgstat.c
Previous Message Peter Eisentraut 2017-05-08 15:27:08 Re: Logical replication ApplyContext bloat