Re: enable_incremental_sort changes query behavior

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: enable_incremental_sort changes query behavior
Date: 2020-10-07 19:52:04
Message-ID: CA+Tgmoa3oNvLBio62jSGXCRhOotsofiqn9guus3+NTQvz6qPWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331(at)gmail(dot)com> wrote:
> The plan (obtained by replacing the volatile function with a stable one):
>
> Unique
> -> Nested Loop
> -> Gather Merge
> Workers Planned: 2
> -> Sort
> Sort Key: ref_0.i, (md5(ref_0.t))
> -> Parallel Index Scan using ref_0_i_idx on ref_0
> -> Function Scan on generate_series ref_1
>
> Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.

As far as I remember this stuff, target lists for any nodes below the
top of the join tree were previously always just Var nodes. Any
expressions that needed to be computed were computed at the level of
the topmost join, before adding upper planner steps like Agg,
WindowAgg, Limit, etc. But, here you've got a complex expression --
md5(ref_0.t) -- begin computed somewhere in the middle of the plan
tree so that the sort can be performed at that point. However, that
seems likely to break a bunch of assumptions all over the planner,
because, unless a lot of work and surgery has been done since I looked
at this last, that md5(ref_0.t) value is not going to "bubble up" to
higher levels of the plan through the tlists of the individual nodes.
That poses a risk that it will be recomputed multiple times, which is
particularly bad for volatile functions because you might not always
get the same answer, but it seems like it might be bad even for
non-volatile functions, because it could be slow if the value is used
at multiple places in the query. It feels like the sort of thing that
might have broken some other assumptions, too, although I can't say
exactly what.

I wonder if whatever part of the incremental sort patch allowed
computation of tlist expressions below the top level of the join tree
ought to just be reverted outright, but that might be overkill. I
can't say that I really understand exactly what's going on here, but
it seems like a pretty significant behavior change to make as part of
another project, and I wonder if we're going to keep finding other
problems with it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-10-07 20:01:27 Re: enable_incremental_sort changes query behavior
Previous Message Robert Haas 2020-10-07 19:32:55 Re: Partitionwise join fails under GEQO