Re: enable_incremental_sort changes query behavior

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: enable_incremental_sort changes query behavior
Date: 2020-10-06 11:45:49
Message-ID: CAAaqYe80-V8q7PozVh22EbBhMDU40W286na3RahxJiZ-Pz8S2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 5, 2020 at 10:38 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Mon, Oct 5, 2020 at 11:33 AM James Coleman <jtc331(at)gmail(dot)com> wrote:
> >
> > On Sun, Oct 4, 2020 at 9:40 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> > >
> > > On Sat, Oct 3, 2020 at 10:10 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> > > >
> > > > On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
> > > > <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > > > >
> > > > > On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
> > > > > >On Fri, Oct 2, 2020 at 11:16 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> > > > > >>
> > > > > >> On Fri, Oct 2, 2020 at 7:07 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> > > > > >> >
> > > > > >> > On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
> > > > > >> > <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > > > > >> > >
> > > > > >> > > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
> > > > > >> > > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
> > > > > >> > > ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > > > > >> > > >>
> > > > > >> > > >> ...
> > > > > >> > > >>
> > > > > >> > > >> More importanly, it does not actually fix the issue - it does fix that
> > > > > >> > > >> particular query, but just replacing the DISTINCT with either ORDER BY
> > > > > >> > > >> or GROUP BY make it fail again :-(
> > > > > >> > > >>
> > > > > >> > > >> Attached is a simple script I used, demonstrating these issues for the
> > > > > >> > > >> three cases that expect to have ressortgroupref != 0 (per the comment
> > > > > >> > > >> before TargetEntry in plannodes.h).
> > > > > >> > > >
> > > > > >> > > >So does checking for volatile expressions (if you happened to test
> > > > > >> > > >that) solve all the cases? If you haven't tested that yet, I can try
> > > > > >> > > >to do that this evening.
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > > Yes, it does fix all the three queries in the SQL script.
> > > > > >> > >
> > > > > >> > > The question however is whether this is the root issue, and whether it's
> > > > > >> > > the right way to fix it. For example - volatility is not the only reason
> > > > > >> > > that may block qual pushdown. If you look at qual_is_pushdown_safe, it
> > > > > >> > > also blocks pushdown of leaky functions in security_barrier views. So I
> > > > > >> > > wonder if that could cause failures too, somehow. But I haven't managed
> > > > > >> > > to create such example.
> > > > > >> >
> > > > > >> > I was about to say that the issue here is slightly different from qual
> > > > > >> > etc. pushdown, since we're not concerned about quals here, and so I
> > > > > >> > wonder where we determine what target list entries to put in a given
> > > > > >> > scan path, but then I realized that implies (maybe!) a simpler
> > > > > >> > solution. Instead of duplicating checks on target list entries would
> > > > > >> > be safe, why not check directly in get_useful_pathkeys_for_relation()
> > > > > >> > whether or not the pathkey has a target list entry?
> > > > > >> >
> > > > > >> > I haven't been able to try that out yet, and so maybe I'm missing
> > > > > >> > something, but I need to step out for a bit, so I'll have to look at
> > > > > >> > it later.
> > > > > >>
> > > > > >> I've started poking at this, but haven't yet found a workable
> > > > > >> solution. See the attached patch which "solves" the problem by
> > > > > >> breaking putting the sort under the gather merge, but it at least
> > > > > >> demonstrates conceptually what I think we're interested in doing.
> > > > > >>
> > > > > >> The issue currently is that the comparison of expressions fails -- in
> > > > > >> our query, the first column selected shows up as a Var (with the same
> > > > > >> contents) but is a different pointer in the em_expr and the reltarget
> > > > > >> exprs list. I don't yet see a good way to get the equivalence class
> > > > > >> for a PathTarget entry.
> > > > > >
> > > > > >Hmm, I think I was looking to do is the attached patch. I didn't
> > > > > >realize until I did a lot of reading through source that we have an
> > > > > >equal() function that can compare exprs. That (plus the realization in
> > > > > >[1] the originally reported backtrace wasn't where the error actually
> > > > > >came from) convinced me that what we need is to confirm not only that
> > > > > >the all of the vars in the ec member come from the relids in the rel
> > > > > >but also that the expr is actually represented in the target of the
> > > > > >rel.
> > > > > >
> > > > > >With the gucs changed as I mentioned earlier both of the plans (with
> > > > > >and without a volatile call in the 2nd select entry) now look like:
> > > > > >
> > > > > > Unique
> > > > > > -> Gather Merge
> > > > > > Workers Planned: 2
> > > > > > -> Sort
> > > > > > Sort Key: ref_0.i, (md5(ref_0.t))
> > > > > > -> Nested Loop
> > > > > > -> Parallel Index Scan using ref_0_i_idx on ref_0
> > > > > > -> Function Scan on generate_series ref_1
> > > > > >
> > > > > >Without the gucs changed the minimal repro case now doesn't error, but
> > > > > >results in this plan:
> > > > > >
> > > > > > HashAggregate
> > > > > > Group Key: ref_0.i, CASE WHEN pg_rotate_logfile_old() THEN ref_0.t
> > > > > >ELSE ref_0.t END
> > > > > > -> Nested Loop
> > > > > > -> Seq Scan on ref_0
> > > > > > -> Function Scan on generate_series ref_1
> > > > > >
> > > > > >Similarly in your six queries I now only see parallel query showing up
> > > > > >in the last one.
> > > > > >
> > > > >
> > > > > OK, that seems reasonable I think.
> > > >
> > > > If we proceed with this patch I'd like to tweak it to check for the
> > > > relids subset first on the theory that it will be cheaper than the
> > > > expr equality check loop; that change is attached.
> > > >
> > > > > >I created an entirely new function because adding the target expr
> > > > > >lookup to the existing find_em_expr_for_rel() function broke a bunch
> > > > > >of postgres_fdw tests. That maybe raises questions about whether that
> > > > > >code also could have problems in theory/in the future, but I didn't
> > > > > >investigate further. In any case we already know it excludes
> > > > > >volatile...so maybe it's fine because in practice that's actually a
> > > > > >broader exclusion than what we're doing here.
> > > > > >
> > > > >
> > > > > I don't think postgres_fdw needs these new checks, because FDW scan's
> > > > > are not allowed in parallel part of the plan - if that changes in the
> > > > > future, I'm sure that'll require fixes in plenty other places.
> > > > >
> > > > > OTOH it's interesting that it triggers those failures - I wonder if we
> > > > > could learn something from them? AFAICS those paths can't be built by
> > > > > generate_useful_gather_paths (because of the postgres_fdw vs. parallel
> > > > > query restrictions), so how do these plans look?
> > > >
> > > > Well the fdw code doesn't trigger the errors, but both
> > > > generate_useful_gather_paths() and the fdw code originally relied on
> > > > find_em_expr_for_rel(), but now they're diverging. IIRC the fdw code
> > > > also uses it for determining what path keys are valid -- in that case
> > > > which ones are safe to be sent to the foreign server (so all of the
> > > > vars have to be from rels on the foreign server). The fdw code has
> > > > additional checks too though, for example no volatile expressions may
> > > > be pushed to the remote.
> > > >
> > > > > >This seems to fix the issue, but I'd like feedback on whether it's too
> > > > > >strict. We could of course just check em_has_volatile, but I'm
> > > > > >wondering if that's simultaneously too strict (by not allowing the
> > > > > >volatile expression to be computed in the gather merge supposing
> > > > > >there's no join) and too loose (maybe there are other cases we should
> > > > > >care about?). It also just strikes me as re-encoding rules that should
> > > > > >have already been applied (and thus we should be able to look up in
> > > > > >the data we have if it's safe to use the expr/pathkey). Those are
> > > > > >mostly intuitions though.
> > > > > >
> > > > >
> > > > > I don't know :-( As I mentioned before, I suspect checking just the
> > > > > volatility may not be enough in some cases (leakyness, etc.) and I think
> > > > > you're right it may be too strict in other cases.
> > > > >
> > > > > Not sure I understand which rules you think we're re-encoding, but I
> > > > > have a nagging feeling there's a piece of code somewhere earlier in
> > > > > the query planner meant to prevent such cases (sort on path without all
> > > > > the pathkeys), and that fixing it here is just a band aid :-(
> > > >
> > > > I'm getting at exactly what you're saying below: there's got to be
> > > > some existing rules (and presumably code already enforcing it in some
> > > > places) at what level in the path tree a given pathkey is
> > > > safe/correct, and we don't want to reinvent those rules (or ideally
> > > > that code). Verifying the expr is in the rel seems to me to be
> > > > intuitively correct, but I wish there was another place in the code
> > > > that could validate that.
> > > >
> > > > > I'm sure we're building plans with Sort on top of Index Scan paths, so
> > > > > how come those don't fail? How come the non-parallel version of these
> > > > > queries don't fail?
> > > >
> > > > Yeah, exactly. Something has to be doing something similar -- I don't
> > > > think everywhere else using sort_pathkeys instead of query_pathkeys
> > > > really changes the problem. I've read through all of the places we use
> > > > generate_useful_gather_paths() as well as root->sort_pathkeys to try
> > > > to get a better understanding of this. Most of the places I convinced
> > > > myself it's already safe -- for example places like
> > > > create_orderd_paths are working with the upper rel, and the full
> > > > target list is available. A few other places I added comments (see the
> > > > "questions" patch in the attached series) where I'm not sure why we
> > > > apply projections *after* we create a sort path; given the issue we're
> > > > discussing here I would have thought you'd want to do exactly the
> > > > opposite.
> > > >
> > > > I haven't yet tried to read through the code in other places where we
> > > > build an explicit sort to see if I can find any hints as to what
> > > > existing code does to ensure this situation doesn't occur, but so far
> > > > I haven't found anything else obvious.
> > >
> > > I did a lot more reading through code, and I concluded that generally
> > > (maybe exclusively?) create_sort_path() is only called on upper rels
> > > (e.g., from create_ordered_paths()), so we already have the full
> > > target list available. I also stepped through in the debugger and
> > > confirmed that non-var exprs generally don't seem to show up in the
> > > pathtarget of paths either on base rels or on join rels. Finally, the
> > > last piece of relevant data is that we don't actually push down sorts
> > > on non-parallel paths even when it would be beneficial. For example,
> > > this query:
> > >
> > > select i, md5(i::text) from t2, generate_series(1,2) order by i;
> > >
> > > generates this plan:
> > >
> > > Sort
> > > Sort Key: t2.i
> > > -> Nested Loop
> > > -> Function Scan on generate_series
> > > -> Materialize
> > > -> Seq Scan on t2
> > >
> > > but it seems obvious to me that it would be better to move the sort to
> > > immediately above the seq scan.
> > >
> > > So I conclude from all of this that the real reason we didn't see this
> > > problem before was that we didn't have anything that was trying to
> > > generate useful sort paths lower in the query, so it just worked. To
> > > state it from the other direction, I don't see any indication that
> > > there actually is any code trying to determine what's safe to put in
> > > the target at various join levels; from what I can tell only vars are
> > > distributed at those points. Thinking about it some more, this
> > > actually makes sense, since get_useful_pathkeys() is new code, and
> > > something like it would have been needed already if we wanted to push
> > > sorts down further into the plan.
> > >
> > > As a bit of a side note: combined with the above idea about pushing
> > > sort down, it could also be beneficial to try to distribute
> > > expressions lower too.
> > >
> > > So I think the previously attached patch is probably correct; the
> > > second patch with the comment questions shows I still don't fully
> > > understand how/when projections are applied (and when they're needed),
> > > though it might have something to do with what I said about only
> > > operating on upper rels.
> >
> > Just FYI in case the patch seems mergeable; I'm adding test cases now,
> > and I think I might make some minor tweaks to the patch.
> >
> > I hope to have a new version out ~this evening.
>
> All right, here's a modified patch. I did a bit of surgery: the
> concept is the same, but I decided to explicitly not the parallels to
> (and follow as possible) prepare_sort_from_pathkeys(). That means we
> only have to do the expression equality check if the expr is volatile,
> and the relids bms check doesn't have to bail if it's empty.
>
> This properly allows us to push the sort all the way down to the base
> rel when the only things in the base rel are referenced (including
> stable expressions) but pushes the sort up to the upper rel when a
> volatile expression is used (technically it would be safe for it to go
> lower, but in the current planner that's the only time it appears in
> the target list, so allowing that would be a larger functional
> change).
>
> I left the questions patch here, though obviously it doesn't need to
> be committed. If someone has answers to the questions, I'd be
> interested though, but I don't think it affects the correctness of
> this patch.

And with a typo fixed.

James

Attachment Content-Type Size
v4-0002-questions.patch text/x-patch 2.3 KB
v4-0001-Fix-get_useful_pathkeys_for_relation-for-volatile.patch text/x-patch 13.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Lawrence Barwick 2020-10-06 11:49:40 Re: Prevent printing "next step instructions" in initdb and pg_upgrade
Previous Message Ashutosh Bapat 2020-10-06 11:42:45 Re: Partitionwise join fails under GEQO