Re: Allowing parallel-safe initplans

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Allowing parallel-safe initplans
Date: 2023-07-10 21:36:43
Message-ID: 2391880.1689025003@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
>> On Mon, Apr 17, 2023 at 11:04 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I wondered about that too, but how come neither of us saw non-cosmetic
>>> failures (ie, actual query output changes not just EXPLAIN changes)
>>> when we tried this?

>> Sorry I forgot to mention that I did see query output changes after
>> moving the initPlans to the Gather node.

> Hmm, my memory was just of seeing the EXPLAIN output changes, but
> maybe those got my attention to the extent of missing the others.

I got around to trying this, and you are right, there are some wrong
query answers as well as EXPLAIN output changes. This mystified me
for awhile, because it sure looks like e89a71fb4 should have made it
work.

>> It seems that in this case the top_plan does not have any extParam, so
>> the Gather node that is added atop the top_plan does not have a chance
>> to get its initParam filled in set_param_references().

Eventually I noticed that all the failing cases were instances of
optimizing MIN()/MAX() aggregates into indexscans, and then I figured
out what the problem is: we substitute Params for the optimized-away
Aggref nodes in setrefs.c, *after* SS_finalize_plan has been run.
That means we fail to account for those Params in extParam/allParam
sets. We've gotten away with that up to now because such Params
could only appear where Aggrefs can appear, which is only in top-level
(above scans and joins) nodes, which generally don't have any of the
sorts of rescan optimizations that extParam/allParam bits control.
But this patch results in needing to have a correct extParam set for
the node just below Gather, and we don't. I am not sure whether there
are any reachable bugs without this patch; but there might be, or some
future optimization might introduce one.

It seems like the cleanest fix for this is to replace such optimized
Aggrefs in a separate tree scan before running SS_finalize_plan.
That's fairly annoying from a planner-runtime standpoint, although
we could skip the extra pass in the typical case where no minmax aggs
have been optimized.

I also thought about swapping the order of operations so that we
run SS_finalize_plan after setrefs.c. That falls down because of
set_param_references itself, which requires those bits to be
calculated already. But maybe we could integrate that computation
into SS_finalize_plan instead? There's certainly nothing very
pretty about the way it's done now.

A band-aid fix that seemed to work is to have set_param_references
consult the Gather's own allParam set instead of the extParam set
of its child. That feels like a kluge though, and it would not
help matters for any future bug involving another usage of those
bitmapsets.

BTW, there is another way in which setrefs.c can inject PARAM_EXEC
Params: it can translate PARAM_MULTIEXPR Params into those. So
those won't be accounted for either. I think this is probably
not a problem, especially not after 87f3667ec got us out of the
business of treating those like initPlan outputs. But it does
seem like "you can't inject PARAM_EXEC Params during setrefs.c"
would not be a workable coding rule; it's too tempting to do
exactly that.

So at this point my inclination is to try to move SS_finalize_plan
to run after setrefs.c, but I've not written any code yet. I'm
not sure if we'd need to back-patch that, but it at least seems
like important future-proofing.

None of this would lead me to want to move initPlans to
Gather nodes injected by debug_parallel_query, though.
We'd have to kluge something to keep the EXPLAIN output
looking the same, and that seems like a kluge too many.
What I am wondering is if the issue is reachable for
Gather nodes that are built organically by the regular
planner paths. It seems like that might be the case,
either now or after applying this patch.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Samokhvalov 2023-07-10 21:37:24 Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?
Previous Message Tristan Partin 2023-07-10 21:07:24 Re: Refactoring backend fork+exec code