Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Date: 2021-06-04 10:17:10
Message-ID: CAA4eK1JRvk8DGHK_brhyzX1uDBXmF6z2f9fc96v=YuY491+K_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 11, 2021 at 12:28 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Temporarily, Just in case someone want to take a look at the patch for the safety check.
>

I am not sure still there is a consensus on which cases exactly need
to be dealt with. Let me try to summarize the discussion and see if
that helps. As per my understanding, the main reasons for this work
are:
a. Ensure parallel-unsafe functions don't get executed in parallel
mode. We do have checks to ensure that we don't select parallel-mode
for most cases where the parallel-unsafe function is used but we don't
have checks for input/output funcs, aggregate funcs, etc. This
proposal is to detect such cases during function invocation and return
an error. I think if, for some cases like aggregate or another type of
functions we allow selecting parallelism relying on the user, it is
not a bad idea to detect and return an error if some parallel-unsafe
function is executed in parallel mode.
b. Detect wrong parallel-safety markings. Say the user has declared
some function as parallel-safe but it invokes another parallel-unsafe
function.
c. The other motive is that this work can help us to enable
parallelism for inserts (and maybe update/delete in the future). As
being discussed in another thread [1], we are considering allowing
parallel inserts on a table based on user input and then at runtime
detect if the insert is invoking any parallel-unsafe expression. The
idea is that the user will be able to specify whether a write
operation is allowed in parallel on a specified relation and we allow
to select parallelism for such writes based on that and do the checks
for Select as we are doing now. There are other options like determine
the parallel-safety of writes in a planner and then only allow
parallelism but those all seem costly. Now, I think it is not
compulsory to have such checks for this particular reason as we are
relying on user input but it will be good if we have it.

I think the last purpose (c) is still debatable even though we
couldn't come up with anything better till now but even if leave that
aside for now, I think the other reasons are good enough to have some
form of checks.

Now, the proposal being discussed is to add a parallel-safety check in
fmgr_info which seems to be invoked during all function executions. We
need to have access to proparallel attribute of the function to check
the parallel-safety and that is readily available in fmgr_info for
non-built-in functions because we already get the pg_proc information
from sys cache. So, I guess there is no harm in checking it when the
information is readily available. However, for built-in functions that
information is not readily available as we get required function
information from FmgrBuiltin (which doesn't have parallel-safety
information). For built-in functions, the following options have been
discussed:
a. Extend FmgrBuiltin without increasing its size to include parallel
information.
b. Enquire pg_proc cache to get the information. Accessing this for
each invocation of builtin could be costly. We can probably incur this
cost only when built-in is invoked in parallel-mode.
c. Don't add check for builtins.

I think if we can't think of any other better way to have checks for
builtins and don't like any of (a) or (b) then there is no harm in
(c). This will at least allow us to have parallel-safety check for
user-defined functions.

Thoughts?

[1] - https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2021-06-04 10:26:05 Re: Asynchronous Append on postgres_fdw nodes.
Previous Message wenjing 2021-06-04 10:01:27 Re: [Proposal] Global temporary tables