Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Date: 2021-06-15 14:41:19
Message-ID: CAA4eK1KH1ikw9rh3m19k-E3TYFn6DBiEQ6DNp4LXPfNz5CXNmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 14, 2021 at 9:08 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Jun 14, 2021 at 2:32 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > Yeah, this could be one idea but I think even if we use pg_proc OID,
> > we still need to check all the rel cache entries to find which one
> > contains the invalidated OID and that could be expensive. I wonder
> > can't we directly find the relation involved and register invalidation
> > for the same? We are able to find the relation to which trigger
> > function is associated during drop function via findDependentObjects
> > by scanning pg_depend. Assuming, we are able to find the relation for
> > trigger function by scanning pg_depend, what kinds of problems do we
> > envision in registering the invalidation for the same?
>
> I don't think that finding the relation involved and registering an
> invalidation for the same will work properly. Suppose there is a
> concurrently-running transaction which has created a new table and
> attached a trigger function to it. You can't see any of the catalog
> entries for that relation yet, so you can't invalidate it, but
> invalidation needs to happen. Even if you used some snapshot that can
> see those catalog entries before they are committed, I doubt it fixes
> the race condition. You can't hold any lock on that relation, because
> the creating transaction holds AccessExclusiveLock, but the whole
> invalidation mechanism is built around the assumption that the sender
> puts messages into the shared queue first and then releases locks,
> while the receiver first acquires a conflicting lock and then
> processes messages from the queue.
>

Won't such messages be proceesed at start transaction time
(AtStart_Cache->AcceptInvalidationMessages)?

> Without locks, that synchronization
> algorithm can't work reliably. As a consequence of all that, I believe
> that, not just in this particular case but in general, the
> invalidation message needs to describe the thing that has actually
> changed, rather than any derived property. We can make invalidations
> that say "some function's parallel-safety flag has changed" or "this
> particular function's parallel-safety flag has changed" or "this
> particular function has changed in some way" (this one, we have
> already), but anything that involves trying to figure out what the
> consequences of such a change might be and saying "hey, you, please
> update XYZ because I changed something somewhere that could affect
> that" is not going to be correct.
>
> > I think we probably need to worry about the additional cost to find
> > dependent objects and if there are any race conditions in doing so as
> > pointed out by Tom in his email [1]. The concern related to cost could
> > be addressed by your idea of registering such an invalidation only
> > when the user changes the parallel safety of the function which we
> > don't expect to be a frequent operation. Now, here the race condition,
> > I could think of could be that by the time we change parallel-safety
> > (say making it unsafe) of a function, some of the other sessions might
> > have already started processing insert on a relation where that
> > function is associated via trigger or check constraint in which case
> > there could be a problem. I think to avoid that we need to acquire an
> > Exclusive lock on the relation as we are doing in Rename Policy kind
> > of operations.
>
> Well, the big issue here is that we don't actually lock functions
> while they are in use. So there's absolutely nothing that prevents a
> function from being altered in any arbitrary way, or even dropped,
> while code that uses it is running. I don't really know what happens
> in practice if you do that sort of thing: can you get the same query
> to run with one function definition for the first part of execution
> and some other definition for the rest of execution? I tend to doubt
> it, because I suspect we cache the function definition at some point.
>

It is possible that in the same statement execution a different
function definition can be executed. Say, in session-1 we are
inserting three rows, on first-row execution definition-1 of function
in index expression gets executed. Now, from session-2, we change the
definition of the function to definition-2. Now, in session-1, on
second-row insertion, while executing definition-1 of function, we
insert in another table that will accept the invalidation message
registered in session-2. Now, on third-row insertion, the new
definition (definition-2) of function will be executed.

> If that's the case, caching the parallel-safety marking at the same
> point seems OK too, or at least no worse than what we're doing
> already. But on the other hand if it is possible for a query's notion
> of the function definition to shift while the query is in flight, then
> this is just another example of that and no worse than any other.
> Instead of changing the parallel-safety flag, somebody could redefine
> the function so that it divides by zero or produces a syntax error and
> kaboom, running queries break. Either way, I don't see what the big
> deal is. As long as we make the handling of parallel-safety consistent
> with other ways the function could be concurrently redefined, it won't
> suck any more than the current system already does, or in any
> fundamentally new ways.
>

Okay, so, in this scheme, we have allowed changing the function
definition during statement execution but even though the rel's
parallel-safe property gets modified (say to parallel-unsafe), we will
still proceed in parallel-mode as if it's not changed. I guess this
may not be a big deal as we can anyway allow breaking the running
statement by changing its definition and users may be okay if the
parallel statement errors out or behave in an unpredictable way in
such corner cases.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Christensen 2021-06-15 14:51:59 Re: [PATCH] expand the units that pg_size_pretty supports on output
Previous Message Yugo NAGATA 2021-06-15 14:24:00 Re: Fix around conn_duration in pgbench