Re: enable_incremental_sort changes query behavior

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: enable_incremental_sort changes query behavior
Date: 2020-10-08 00:38:34
Message-ID: CAAaqYe9bJ9QMtFK2r+rSaPROV7MGqf2Nn9w9WbF-zJhP05EsdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 7, 2020 at 6:22 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Wed, Oct 07, 2020 at 04:01:27PM -0400, James Coleman wrote:
> >On Wed, Oct 7, 2020 at 3:52 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >>
> >> On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331(at)gmail(dot)com> wrote:
> >> > The plan (obtained by replacing the volatile function with a stable one):
> >> >
> >> > Unique
> >> > -> Nested Loop
> >> > -> Gather Merge
> >> > Workers Planned: 2
> >> > -> Sort
> >> > Sort Key: ref_0.i, (md5(ref_0.t))
> >> > -> Parallel Index Scan using ref_0_i_idx on ref_0
> >> > -> Function Scan on generate_series ref_1
> >> >
> >> > Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
> >>
> >> As far as I remember this stuff, target lists for any nodes below the
> >> top of the join tree were previously always just Var nodes. Any
> >> expressions that needed to be computed were computed at the level of
> >> the topmost join, before adding upper planner steps like Agg,
> >> WindowAgg, Limit, etc. But, here you've got a complex expression --
> >> md5(ref_0.t) -- begin computed somewhere in the middle of the plan
> >> tree so that the sort can be performed at that point. However, that
> >> seems likely to break a bunch of assumptions all over the planner,
> >> because, unless a lot of work and surgery has been done since I looked
> >> at this last, that md5(ref_0.t) value is not going to "bubble up" to
> >> higher levels of the plan through the tlists of the individual nodes.
> >> That poses a risk that it will be recomputed multiple times, which is
> >> particularly bad for volatile functions because you might not always
> >> get the same answer, but it seems like it might be bad even for
> >> non-volatile functions, because it could be slow if the value is used
> >> at multiple places in the query. It feels like the sort of thing that
> >> might have broken some other assumptions, too, although I can't say
> >> exactly what.
> >
> >The distinction between volatile and stable expressions here has been
> >discussed at length in this thread; I think it'd be worth reading
> >through the rest of the thread before offering sledgehammer ideas
> >below like reverting the entire piece.
> >
>
> Um, that's a tad too harsh, I guess ...

Yes, it was too harsh. So, my apologies, Robert.

Nonetheless I do think that reverting would be a surprisingly strong
reaction for this because I don't think the problem scope is
particularly dire.

Two reasons why:

1. As noted earlier it'd be pretty easy to limit this sort pushdown
only to sorts on Vars. But I don't think that;s necessary, because:
2. From my time in the debugger on this issue, I don't see how the
planner can be assuming these things won't bubble up.

For one thing, your Tomas's example below shows the stable expression
being computed at the scan node level and bubbling up through.

For another thing, earlier in this thread [1] I have a patch that
compares expressions in the path target to the expression in the
pathkey, and even under a gather merge the stable expression (md5) is
already in the path target at that level of the path.

> We've discussed the volatile/stable expressions before, but AFAIK we
> failed to notice the planner assumes such expressions are expected to be
> computed only in the upper parts of the plan. If this assumption really
> exists, it probably relies on simply calling create_sort_path just about
> right, and the new code looking at useful pathkeys below gather merge
> might be breaking that simply by doing it too early.

I don't think such an assumption exists, as explained above, or at
least I can't find an indication of it, and it's already violated in
pre-incremental sort code.

> In that case we can either rip this code out, effectively reverting most
> of ba3e76cc571, or consider only pathkeys that we can find in the tlist
> of the child path.

This is exactly what the patch in [1] does. The current patch
effectively does it also, but I believe it does so in a more refined
way.

> I'm still not entirely sure I understand what's happening, or what the
> exact rule is. Consider this query:
>
> explain (verbose) select distinct i, t, md5(t) from ref_0;
>
> which on PG12 (i.e. before incremental sort) is planned like this:
>
> QUERY PLAN
> ----------------------------------------------------------------------------------
> Unique (cost=78120.92..83120.92 rows=500000 width=65)
> Output: i, t, (md5(t))
> -> Sort (cost=78120.92..79370.92 rows=500000 width=65)
> Output: i, t, (md5(t))
> Sort Key: ref_0.i, ref_0.t, (md5(ref_0.t))
> -> Seq Scan on public.ref_0 (cost=0.00..10282.00 rows=500000 width=65)
> Output: i, t, md5(t)
> (7 rows)
>
> i.e. the (stable) function is pushed all the way to the scan node. And
> even if we replace it with a volatile expression it gets pushed down:
>
> explain (verbose) select distinct i, t, md5(random()::text || t) from ref_0;
>
> QUERY PLAN
> ----------------------------------------------------------------------------------
> Unique (cost=83120.92..88120.92 rows=500000 width=65)
> Output: i, t, (md5(((random())::text || t)))
> -> Sort (cost=83120.92..84370.92 rows=500000 width=65)
> Output: i, t, (md5(((random())::text || t)))
> Sort Key: ref_0.i, ref_0.t, (md5(((random())::text || ref_0.t)))
> -> Seq Scan on public.ref_0 (cost=0.00..15282.00 rows=500000 width=65)
> Output: i, t, md5(((random())::text || t))
> (7 rows)
>
>
> But perhaps I just don't understand the assumption correctly?
>
> >In particular, most of the discussion in the thread has been about
> >what is actually safe to reference at lower levels in the plan. It
> >would be extremely easy tomake this only allow sorting by Vars at
> >lower levels in the plan, though the current revision of the patch
> >follows what prepare_sort_from_pathkeys allows and disallows.
> >
> >Obviously it'd be nice to be able to sort by non-Var expressions at
> >lower levels too. But again, it'd be easy to change that if it's not
> >safe, so it'd be helpful if you could point to areas that don't bubble
> >non-Vars up so that we can actually comment about that in the source
> >and discuss specifics.
> >
>
> I think the proble here might be "what we think is theoretically safe"
> vs. "what the planner expects". I wouldn't be surprised if this was safe
> in theoretical sense, but planner not expecting that.

Agreed as a general rule, but as above I'm having a hard time matching
up pre-this-code behavior with the claim that the planner isn't
expecting it.

> >> I wonder if whatever part of the incremental sort patch allowed
> >> computation of tlist expressions below the top level of the join tree
> >> ought to just be reverted outright, but that might be overkill. I
> >> can't say that I really understand exactly what's going on here, but
> >> it seems like a pretty significant behavior change to make as part of
> >> another project, and I wonder if we're going to keep finding other
> >> problems with it.
> >
>
> It certainly was not my intention to make such change in this patch. I
> don't think the earlier parts of the incremental sort have this issue
> because all those places were already constructing regular sort before.
>
> But this code is different because it considers adding sort node (both
> regular and incremental) below a gather merge, which was not happening
> before. I suspect this is the problem, somehow.

If my above understanding holds, then it's not a significant behavior
change, but if it is, we have a simple workaround (looking only at
pathkeys referencing a Var in the path target list).

James

1: https://www.postgresql.org/message-id/CAAaqYe9Q5TL-d0Fcs1m9bes7YbfkrizmEVFF%2BLdTLE6ZEXCwhg%40mail.gmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-10-08 01:01:24 Re: new heapcheck contrib module
Previous Message Mark Dilger 2020-10-08 00:37:21 Re: new heapcheck contrib module