Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-07-08 13:22:39
Message-ID: CAAaqYe9LWNYik9E0ndiNc_AQ9TzEFKPSSs-0HU8PjOJXWm9Ldw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 7, 2019 at 5:02 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> We're running query like this:
>
> SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3
>
> but we're trying to add the incremental sort *before* the aggregation,
> because the optimizer also considers group aggregate with a sorted
> input. And (a) is a prefix of (a,sum(b),count(b)) so we think we
> actually can do this, but clearly that's nonsense, because we can't
> possibly know the aggregates yet. Hence the error.
>
> If this is the actual issue, we need to ensure we actually can evaluate
> all the pathkeys. I don't know how to do that yet. I thought that maybe
> we should modify pathkeys_common_contained_in() to set presorted_keys to
> 0 in this case.
>
> But then I started wondering why we don't see this issue even for
> regular (non-incremental-sort) paths built in create_ordered_paths().
> How come we don't see these failures there? I've modified costing to
> make all incremental sort paths very cheap, and still nothing.

I assume you mean you modified costing to make regular sort paths very cheap?

> So presumably there's a check elsewhere (either implicit or explicit),
> because create_ordered_paths() uses pathkeys_common_contained_in() and
> does not have the same issue.

Given this comment in create_ordered_paths():

generate_gather_paths() will have already generated a simple Gather
path for the best parallel path, if any, and the loop above will have
considered sorting it. Similarly, generate_gather_paths() will also
have generated order-preserving Gather Merge plans which can be used
without sorting if they happen to match the sort_pathkeys, and the loop
above will have handled those as well. However, there's one more
possibility: it may make sense to sort the cheapest partial path
according to the required output order and then use Gather Merge.

my understanding is that generate_gather_paths() only considers paths
that already happen to be sorted (not explicit sorts), so I'm
wondering if it would make more sense for the incremental sort path
creation for this case to live alongside the explicit ordered path
creation in create_ordered_paths() rather than in
generate_gather_paths().

If I'm understanding what you're saying properly, I think you'd
expected create_ordered_paths() to be roughly similar in what it
considers as partial paths and so have the same problem, and I haven't
yet read enough of the code to understand if my proposed change
actually has any impact on the issue we're discussing, but it seems to
me that it at least fits more with what the comments imply.

I'll try to look at it a bit more later also, but at the moment other
work calls.

James Coleman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-07-08 13:59:33 Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message Julien Rouhaud 2019-07-08 13:15:18 Re: Excessive memory usage in multi-statement queries w/ partitioning