Re: 024_add_drop_pub.pl might fail due to deadlock

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 024_add_drop_pub.pl might fail due to deadlock
Date: 2025-07-14 10:45:53
Message-ID: CALDaNm14FkrASB8jj27k6MSgrDpOJSZpVv=y=BHvhAoz5B7rNw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 14 Jul 2025 at 15:46, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Tue, Jul 8, 2025 at 8:41 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > Patch with fix attached.
> > I'll continue investigating whether this issue also affects HEAD.
> >
>
> While debugging if this problem can occur on HEAD, I found out that on
> head, it is mostly the tablesync worker that drops the origin on HEAD
> and since the tablesysnc worker does not attempt to update the
> SubscriptionRel state in that process, there doesn't seem to be the
> possibility of a deadlock. But there is a rare situation where the
> tablesync worker could crash or get an error just prior to dropping
> the origin, then the origin is dropped in the apply worker (this is
> explained in the comments in process_syncing_tables_for_sync()). If
> the origin has to be dropped in the apply worker, then the same
> deadlock can happen in HEAD code as well. I was able to simulate this
> by using an injection point to create an error on the tablesync worker
> and then the similar deadlock happens on HEAD as well. Attaching a
> patch for fixing this on HEAD as well.

I was able to reproduce the deadlock on HEAD as well using the
attached patch, which introduces a delay in the tablesync worker
before dropping the replication origin by adding a sleep of a few
seconds. During this delay, the apply worker also attempts to drop the
replication origin. If an ALTER SUBSCRIPTION command is executed
concurrently, a deadlock frequently occurs:
2025-07-14 15:59:53.572 IST [141100] DETAIL: Process 141100 waits for
AccessExclusiveLock on object 2 of class 6000 of database 0; blocked
by process 140974.
Process 140974 waits for AccessShareLock on object 16396 of class 6100
of database 0; blocked by process 141100.
Process 141100: alter subscription sub1 drop publication pub1
Process 140974: <command string not enabled>

After apply the attached patch, create the logical replication setup
for a publication pub1 having table t1 and then run the following
commands in a loop:
alter subscription sub1 drop publication pub1;
alter subscription sub1 add publication pub1;
sleep 4

Regards,
Vignesh

Attachment Content-Type Size
deadlock_simulate_add_drop_pub.patch application/octet-stream 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniil Davydov 2025-07-14 10:49:10 Re: POC: Parallel processing of indexes in autovacuum
Previous Message Japin Li 2025-07-14 10:24:44 Re: [WIP]Vertical Clustered Index (columnar store extension) - take2