Fix generate_useful_gather_paths for parallel unsafe pathkeys

From: James Coleman <jtc331(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: luis(dot)roberto(at)siscobra(dot)com(dot)br, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Fix generate_useful_gather_paths for parallel unsafe pathkeys
Date: 2020-11-21 01:55:39
Message-ID: CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over on the -bugs list we had a report [1] of a seg fault with
incremental sort. The short of the investigation there was that a
subplan wasn't being serialized since it wasn't parallel safe, and
that attempting to initialize that subplan resulted in the seg fault.
Tom pushed a change to raise an ERROR when this occurs so that at
least we don't crash the server.

There was some small amount of discussion about this on a thread about
a somewhat similar issue [2] (where volatile expressions were the
issue), but that thread resulted in a committed patch, and beyond that
it seems that this issue deserves its own discussion.

I've been digging further into the problem, and my first question was
"why doesn't this result in a similar error but with a full sort when
incremental sort is disabled?" After all, we have code to consider a
gather merge + full sort on the cheapest partial path in
generate_useful_gather_paths(), and the plan that I get on Luis's test
case with incremental sort disabled has an full sort above a gather,
which presumably it'd be cheaper to push down into the parallel plan
(using gather merge instead of gather).

It turns out that there's a separate bug here: I'm guessing that in
the original commit of generate_useful_gather_paths() we had a
copy/paste thinko in this code:

/*
* If the path has no ordering at all, then we can't use either
* incremental sort or rely on implicit sorting with a gather
* merge.
*/
if (subpath->pathkeys == NIL)
continue;

The comment claims that we should skip subpaths that aren't sorted at
all since we can't possibly use a gather merge directly or with an
incremental sort applied first. It's true that we can't do either of
those things unless the subpath is already at least partially sorted.
But subsequently we attempt to construct a gather merge path on top of
a full sort (for the cheapest path), and there's no reason to limit
that to partially sorted paths (indeed in the presence of incremental
sort it seems unlikely that that would be the cheapest path).

We still don't want to build incremental sort paths if the subpath has
no pathkeys, but that will imply presorted_keys = 0, which we already
check.

So 0001 removes that branch entirely. And, as expected, we now get a
gather merge + full sort the resulting error about subplan not being
initialized.

With that oddity out of the way, I started looking at the actually
reported problem, and nominally issue is that we have a correlated
subquery (in the final target list and the sort clause), and
correlated subqueries (not sure why exactly in this case; see [3])
aren't considered parallel-safe.

As to why that's happening:
1. find_em_expr_usable_for_sorting_rel doesn't exclude that
expression, and arguably correctly so in the general case since one
might (though we don't now) use this same kind of logic for
non-parallel plans.
2. generate_useful_pathkeys_for_relation is noting that it'd be useful
to sort on that expression and sees it as safe in part due to the (1).
3. We create a sort node that includes that expression in its output.
That seems pretty much the same (in terms of safety given generalized
input paths/plans, not in terms of what Robert brought up in [4]) as
what would happen in the prepare_sort_from_pathkeys call in
create_gather_merge_plan.

So to fix this problem I've added the option to find_em_expr_for_rel
to restrict to parallel-safe expressions in 0002.

Given point 3 above (and the discussion with Robert earlier) above it
seems to me that there's a bigger problem here that just hasn't
happened to have been exposed yet: in particular the
prepare_sort_from_pathkeys call in create_gather_merge_plan seems
suspect to me. But it's also possible other usages of
prepare_sort_from_pathkeys could be suspect given other seemingly
unrelated changed to path generation, since nothing other than
implicit knowledge about the conventions here prevents us from
introducing paths generating sorts with expressions not in the target
list (in fact a part of the issue here IMO is that non-var expression
pathkeys are added to the target list after path generation and
selection is completed).

Alternatively we could "fix" this by being even more conservative in
find_em_expr_usable_for_sorting_rel and disregard any expressions not
currently found in the target list. But that seems unsatisfactory to
me since, given we know that there are cases where
prepare_sort_from_paths is explicitly intended to add pathkey
expressions to the target list, we'd be excluding possible useful
cases (and indeed we already have, albeit contrived, test cases that
fail if that change is made).

There exists a very similar path in the postgres fdw code (in fact the
function name is the same: get_useful_pathkeys_for_relation) since
find_em_expr_for_rel is much simpler (it doesn't do many safety checks
at all like we do in find_em_expr_usable_for_sorting_rel), but I think
that's safe (though I haven't thought about it a ton) because I
believe those are sort keys we're going to send to the foreign server
anyway, as opposed to sorting ourselves (but I haven't verified that
that's the case and that we never create sort nodes there).

James

1: https://www.postgresql.org/message-id/622580997.37108180.1604080457319.JavaMail.zimbra%40siscobra.com.br
2: https://www.postgresql.org/message-id/CAJGNTeNaxpXgBVcRhJX%2B2vSbq%2BF2kJqGBcvompmpvXb7pq%2BoFA%40mail.gmail.com
3: https://www.postgresql.org/message-id/CAAaqYe_vihKjc%2B8LuQa49EHW4%2BKfefb3wHqPYFnCuUqozo%2BLFg%40mail.gmail.com
4: https://www.postgresql.org/message-id/CA%2BTgmobX2079GNJWJVFjo5CmwTg%3DoQQOybFQL2CyZxMY59P6BA%40mail.gmail.com

Attachment Content-Type Size
v1-0002-Enforce-parallel-safety-of-pathkeys-in-generate_u.patch text/x-patch 8.1 KB
v1-0001-generate_useful_gather_paths-shouldn-t-skip-unsor.patch text/x-patch 3.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-21 02:50:08 Re: Removal of currtid()/currtid2() and some table AM cleanup
Previous Message Michael Paquier 2020-11-21 01:30:03 Re: Online verification of checksums