Re: Re-read subscription state after lock in AlterSubscription

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Re-read subscription state after lock in AlterSubscription
Date: 2026-07-04 08:00:08
Message-ID: CAFiTN-th6yN9W6QUu4oT05O2YU1bL7LiGbuBFQbrQmXG7LzS1A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 3, 2026 at 9:09 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Fri, Jul 03, 2026 at 03:45:34PM +0530, Amit Kapila wrote:
> > On Fri, Jul 3, 2026 at 1:38 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Friday, July 3, 2026 1:53 PM Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > > >
> > > > > but given the patch's simplicity, I recommend backpatching.
> > > >
> > > > That's right but that would only improve error messages. That said, looking
> > > > closer, they are elog() ones, so "not expected" to occur so yeah backpatch
> > > > does make sense.
> > >
> > > +1 for backpatching, even if it's rare, the "ERROR: tuple concurrently updated"
> > > message seems confusing to me.
> > >
> >
> > I also think backpatching makes sense. BTW, I have a comment:
>
> Thanks for looking at it!
>
> > + heap_freetuple(tup);
> > + tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, ObjectIdGetDatum(MyDatabaseId),
> > + CStringGetDatum(stmt->subname));
> >
> > heap_freetuple() could be done before acquiring the lock, is there a
> > reason to keep it after lock?
>
> No particular reason, could be done before. Done in 0001 attached.
>
> >
> > > >
> > > > That said, what about also fixing DropSubscription() like in the 0002 attached?
> > > > (that would also produce those elog() messages in case of concurrent DROP or
> > > > ALTER).
> > >
> > > For the patch, I'm not sure if we must repeat the checks twice. Could we
> > > simply move the original checks to after we take the lock? At least, the
> > > GetSubscription() call and the password check can be moved there and old codes
> > > can be deleted.
> > >
> >
> > Isn't the same true for the AlterSubscription() case as well?
>
> I think there is no need to lock if we are later going to disallow changing the
> subscription data due to the password_required/superuser check.
>
> That said moving it as suggested by Hou-san, does simplify the code and the lock
> is not held for long, so done that way in 0001.
>
> > Also, I
> > noticed that AlterPublication() does the same trick but it uses
> > PUBLICATIONOID cacheid, so shouldn't we use SUBSCRIPTIONOID cacheid
> > here as well? I think this is to prevent the case where the same name
> > pub/sub is recreated after lock.
>
> Oh right and I did it that way in 0001 and 0002.
>
> But while doing this and looking closely, I'm not sure AlterPublication() does
> it right. Indeed, in theory, the OID could have been re-used too (between the
> time we did the name resolution and the time we lock the publication). I think
> what is needed is something similar to RangeVarGetRelidExtended(), means do the
> name resolution, acl check (ownership) and lock acquisition, all in unison.
>
> That's what 0003 is trying to achieve for the subscription and 0004 for the
> publication.
>
> What do you think?
>
0003:

It looks like the implementation of DROP SUBSCRIPTION IF EXISTS has a
concurrent drop race condition in DropSubscription(). Currently, if
stmt->missing_ok is true, the initial lookup safely handles a missing
subscription. However, once a subscription is found and the code
enters the drop loop, a second internal lookup/refetch happens. If a
concurrent transaction drops the subscription after our initial check
but before this internal refetch, the code throws an error.
Essentially, the loop completely ignores the missing_ok flag during
the refetch phase. Am I missing something?

--
Regards,
Dilip Kumar
Google

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2026-07-04 08:19:46 Re: Re-read subscription state after lock in AlterSubscription
Previous Message Bertrand Drouvot 2026-07-04 07:47:08 Fix races conditions in DropRole() and GrantRole()