Re: [BUG] orphaned function

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gilles Darold <gilles(at)darold(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Subject: Re: [BUG] orphaned function
Date: 2021-02-02 08:15:05
Message-ID: 03a3a7f2-4c67-7bb2-cad2-8ec7b9800942@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

+ Jim and Nathan.

On 12/18/20 12:26 AM, Tom Lane wrote:
> Gilles Darold <gilles(at)darold(dot)net> writes:
>> The same problem applies if the returned type or the procedural language
>> is dropped. I have tried to fix that in ProcedureCreate() by using an
>> AccessSharedLock for the data types and languages involved in the
>> function declaration. With this patch the race condition with parameters
>> types, return types and PL languages are fixed. Only data types and
>> procedural languages with Oid greater than FirstBootstrapObjectId will
>> be locked locked. But this is probably more complex than this fix so it
>> is proposed as a separate patch
>> (v1-003-orphaned_function_types_language.patch) to not interfere with
>> the applying of Bertran's patch.
> Indeed. This points up one of the things that is standing in the way
> of any serious consideration of applying this patch. To my mind there
> are two fundamental, somewhat interrelated considerations that haven't
> been meaningfully addressed:
>
> 1. What set of objects is it worth trying to remove this race condition
> for, and with respect to what dependencies? Bertrand gave no
> justification for worrying about function-to-namespace dependencies
> specifically,

I focused on this one because this is the most (if not the unique) one
we have observed so far.

Those orphaned functions are breaking stuff like pg_dump, pg_upgrade and
that's how we discovered them.

That being said, I agree that there is no reason to focus only on this
ones and we should better try to "fix" them all.
> and you've likewise given none for expanding the patch
> to consider function-to-datatype dependencies. There are dozens more
> cases that could be considered; but I sure don't want to be processing
> another ad-hoc fix every time someone thinks they're worried about
> another specific case.
>
> Taking a practical viewpoint, I'm far more worried about dependencies
> of tables than those of any other kind of object. A messed-up function
> definition doesn't cost you much: you can just drop and recreate the
> function. A table full of data is way more trouble to recreate, and
> indeed the data might be irreplaceable. So it seems pretty silly to
> propose that we try to remove race conditions for functions' dependencies
> on datatypes without removing the same race condition for tables'
> dependencies on their column datatypes.
>
> But any of these options lead to the same question: why stop there?
> An approach that would actually be defensible, perhaps, is to incorporate
> this functionality into the dependency mechanism: any time we're about to
> create a pg_depend or pg_shdepend entry, take out an AccessShareLock on
> the referenced object, and then check to make sure it still exists.
> This would improve the dependency mechanism so it prevents creation-time
> race conditions, not just attempts to drop dependencies of
> already-committed objects. It would mean that the one patch would be
> the end of the problem, rather than looking forward to a steady drip of
> new fixes.

Agree that working with pg_depend and pg_shdepend is the way to go.

Instead of using the locking machinery (and then make the one that would
currently produce the orphaned object waiting), Jim proposed another
approach: make use of special snapshot (like a dirty one depending of
the case).

That way instead of locking we could instead report an error, something
like this:

postgres=# drop schema tobeorph;
ERROR:  cannot drop schema tobeorph because other objects that are
currently under creation depend on it
DETAIL:  function under creation tobeorph.bdttime() depends on schema
tobeorph

>
> 2. How much are we willing to pay for this added protection? The fact
> that we've gotten along fine without it for years suggests that the
> answer needs to be "not much". But I'm not sure that that actually
> is the answer, especially if we don't have a policy that says "we'll
> protect against these race conditions but no other ones". I think
> there are possibly-serious costs in three different areas:
>
> * Speed. How much do all these additional lock acquisitions slow
> down a typical DDL command?
>
> * Number of locks taken per transaction. This'd be particularly an
> issue for pg_restore runs using single-transaction mode: they'd take
> not only locks on the objects they create, but also a bunch of
> reference-protection locks. It's not very hard to believe that this'd
> make a difference in whether restoring a large database is possible
> without increasing max_locks_per_transaction.
>
> * Risk of deadlock. The reference locks themselves should be sharable,
> so maybe there isn't much of a problem, but I want to see this question
> seriously analyzed not just ignored.
All of this should be mitigated with the "special snapshot" approach.
>
> Obviously, the magnitude of these costs depends a lot on how many
> dependencies we want to remove the race condition for. But that's
> exactly the reason why I don't want a piecemeal approach of fixing
> some problems now and some other problems later. That's way too
> much like the old recipe for boiling a frog: we could gradually get
> into serious performance problems without anyone ever having stopped
> to consider the issue.
>
> In short, I think we should either go for a 100% solution if careful
> analysis shows we can afford it, or else have a reasoned policy
> why we are going to close these specific race conditions and no others
> (implying that we'll reject future patches in the same area).

Agree.

What about taking the 100% solution way with the "special snapshot"
approach?

If you think that could make sense then we (Jim, Nathan and I) can work
on a patch with this approach and come back with a proposal.

Bertrand

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-02-02 08:18:53 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Joel Jacobson 2021-02-02 08:13:43 Re: [PATCH] Doc: improve documentation of oid columns that can be zero. (correct version)