Re: BUG #18988: DROP SUBSCRIPTION locks not-yet-accessed database

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18988: DROP SUBSCRIPTION locks not-yet-accessed database
Date: 2025-07-31 05:20:43
Message-ID: CAFiTN-s=iHQ+4u=6qOeNvGrrJ4J7ysfJbejsDx7D4Kpdo1xttA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Jul 30, 2025 at 4:24 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, 17 Jul 2025 at 00:36, PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference: 18988
> > Logged by: Alexander Lakhin
> > Email address: exclusion(at)gmail(dot)com
> > PostgreSQL version: 18beta1
> > Operating system: Ubuntu 24.04
> > Description:
> >
> > The following script:
> > createdb ndb
> > echo "
> > CREATE SUBSCRIPTION testsub CONNECTION 'dbname=ndb' PUBLICATION testpub WITH
> > (connect = false);
> > " | psql
> >
> > echo "
> > DROP SUBSCRIPTION testsub
> > " | psql &
> > sleep 1
> > timeout 30 psql ndb -c "SELECT 1" || echo "TIMEOUT"
> >
> > makes DROP SUBSCRIPTION stuck on waiting for a connection to drop a slot,
> > while this connection is waiting for a lock for relation 6100
> > (pg_subscription), locked by DROP SUBSCRIPTION:
> > law 1545967 1545946 0 21:10 ? 00:00:00 postgres: law regression
> > [local] DROP SUBSCRIPTION
> > law 1545968 1545946 0 21:10 ? 00:00:00 postgres: walsender law
> > ndb [local] startup waiting
> >
> > With debug_discard_caches = 1 or under some lucky circumstances (I
> > encountered these), this leads to inability to connect to any database.
> >
> > Reproduced on REL_13_STABLE .. master.
>
> Thanks, I was able to reproduce the issue using the steps provided.
> The problem occurs because: When dropping a subscription, it takes an
> AccessExclusiveLock on the pg_subscription system tables to prevent
> the launcher from restarting the worker. During this process, it also
> attempts to connect to the publisher in order to drop the replication
> slot. As we are connecting to a newly created database, it may not yet
> have initialized its catalog caches. As part of the backend startup,
> it attempts to build the cache hierarchy via:
> RelationCacheInitializePhase3 → InitCatalogCachePhase2 →
> InitCatCachePhase2 This cache initialization requires acquiring a
> shared lock on pg_subscription, since it is one of the syscache-backed
> catalog tables. But that shared lock is blocked by the
> AccessExclusiveLock already held by the dropping process. As a result,
> the new backend hangs waiting for the lock, and the original DROP
> SUBSCRIPTION process cannot proceed, leading to a self-blocking
> scenario.
>
> In this specific case, no replication slot was created during
> subscription creation as the connect option was specified as false.
> Therefore, I believe the system should skip connecting to the
> publisher when dropping the subscription. I've attached a patch that
> addresses this behavior. Thoughts?

I think this fix looks correct, while testing I realize that now we
need an additional step before we can enable the subscription, so I
think we should put that additional step in the error hint, as
attached, this top up patch can be applied on top of your patch.

--
Regards,
Dilip Kumar
Google

Attachment Content-Type Size
additinal_errhint.patch application/octet-stream 11.7 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tender Wang 2025-07-31 05:30:45 Re: BUG #19000: gist index returns inconsistent result with gist_inet_ops
Previous Message PG Bug reporting form 2025-07-31 03:06:10 BUG #19004: Incorrect lowercasing of word-final Greek capital Sigma (Σ)