Re: [HACKERS] [PATCH] Incremental sort

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Incremental sort
Date: 2018-03-31 20:43:52
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers


I've been doing a bit more review of the patch today, focusing on the
planner part, and I'm starting to have some doubts regarding the way
incremental sort paths are created. I do have some question about the
executor and other parts too.

I'll mark this as 'waiting on author' to make it clear the patch is
still being discussed, RFC is not appropriate status for that.

Attached is a patch that highlights some of the interesting places, and
also suggests some minor changes to comments and other tweaks.

1) planning/costing of incremental vs. non-incremental sorts

In sort, all the various places that create/cost sorts:

* createplan.c (make_sort)
* planner.c (create_sort_path)
* pathnode.c (cost_sort)

seem to prefer incremental sorts whenever available. Consider for
example this code from create_merge_append_plan():

if (!pathkeys_common_contained_in(pathkeys, subpath->pathkeys,
Sort *sort = make_sort(subplan, numsortkeys,
sortColIdx, sortOperators,
collations, nullsFirst);

label_sort_with_costsize(root, sort, best_path->limit_tuples);
subplan = (Plan *) sort;

This essentially says that when (n_common_pathkeys > 0), the sort is
going to be incremental.

That however seems to rely on an important assumption - when the input
is presorted, the incremental sort is expected to be cheaper than
regular sort.

This assumption however seems to be proven invalid by cost_sort, which
does the common part for both sort modes (incremental/non-incremental)
first, and then does this:

/* Extra costs of incremental sort */
if (presorted_keys > 0)
... add something to the costs ...

That is, the incremental cost seems to be pretty much guaranteed to be
more expensive than regular Sort (with the exception of LIMIT queries,
where it's guaranteed to win thanks to lower startup cost).

I don't know how significant the cost difference may be (perhaps not
much), or if it may lead to inefficient plans. For example what if the
cheapest total path is partially sorted by chance, and only has a single
prefix group? Then all the comparisons with pivotSlot are unnecessary.

But I'm pretty sure it may lead to surprising behavior - for example if
you disable incremental sorts (enable_incrementalsort=off), the plan
will switch to plain sort without the additional costs. So you'll get a
cheaper plan by disabling some operation. That's surprising.

So I think it would be more appropriate if those places actually did a
costing of incremental vs. non-incremental sorts, and then constructed
the cheaper option. Essentially we should consider both plain and
incremental sort for each partially sorted input path, and then pick the
right one.

Of course, this is going to be tricky in createplan.c which builds the
plans directly - in that case it might be integrated into make_sort() or
something like that.

Also, I wonder if we could run into problems, due to incremental sort
not supporting things the regular sort does (rewind, backwards scans and

2) nodeIncrementalsort.c

There's a couple of obsolete comments, that came from nodeSort.c and did
not get tweaked (and so talk about first-time through when incremental
sort needs to do that for each group, etc.). The attached diff tweaks
those, and clarifies a couple of others. I've also added some comments
explaining what the pivotSlot is about etc. There's also a couple of XXX
comments asking additional questions/clarifications.

I'm wondering if a static MIN_GROUP_SIZE is good idea. For example, what
if the subplan is expected to return only very few tuples (say, 33), but
the query includes LIMIT 1. Now, let's assume the startup/total cost of
the subplan is 1 and 1000000. With MIN_GROUP_SIZE 32 we're bound to
execute it pretty much till the end, while we could terminate after the
first tuple (if the prefix changes).

So I think we should use a Min(limit,MIN_GROUP_SIZE) here, and perhaps
this should depend on average group size too.

The other questionable thing seems to be this claim:

* We unlikely will have huge groups with incremental sort. Therefore
* usage of abbreviated keys would be likely a waste of time.

followed by disabling abbreviated keys in the tuplesort_begin_heap call.
I find this rather dubious and unsupported by any arguments (I certainly
don't see any in the comments).

If would be more acceptable if the estimated number of groups was used
when deciding whether to use incremental sort or not, but that's not the
case - as explained in the first part, we simply prefer incremental
sorts whenever there is a prefix. In those cases we have very little
idea (or even guarantees) regarding the average group size.

Furthermore, cost_sort is estimating the number of groups, so it does
know the average group size. I don't see why we couldn't consider it
here too, and disable/enable abbreviated keys depending on that.

3) pathkeys.c

The new function pathkeys_useful_for_ordering() does actually change
behavior depending on enable_incrementalsort. That seems like a rather
bad idea, for a couple or reasons.

AFAICS pathkeys.c is supposed to provide generic utils for work with
pathkeys, and no one would expect the functions to change behavior
depending on enable_* GUCs. I certainly would not.

In short, this does not seem like the right place to enable/disable
incremental sorts, that should be done when costing the plan (i.e. in
costsize.c) or creating the plan.


Tomas Vondra
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-comments.patch text/x-patch 18.1 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-03-31 21:06:54 Re: [HACKERS] [PATCH] Incremental sort
Previous Message Tomas Vondra 2018-03-31 18:28:31 Re: some last patches breaks plan cache