From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Ajin Cherian <itsajin(at)gmail(dot)com> |
Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: 024_add_drop_pub.pl might fail due to deadlock |
Date: | 2025-07-14 10:45:53 |
Message-ID: | CALDaNm14FkrASB8jj27k6MSgrDpOJSZpVv=y=BHvhAoz5B7rNw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 14 Jul 2025 at 15:46, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Tue, Jul 8, 2025 at 8:41 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > Patch with fix attached.
> > I'll continue investigating whether this issue also affects HEAD.
> >
>
> While debugging if this problem can occur on HEAD, I found out that on
> head, it is mostly the tablesync worker that drops the origin on HEAD
> and since the tablesysnc worker does not attempt to update the
> SubscriptionRel state in that process, there doesn't seem to be the
> possibility of a deadlock. But there is a rare situation where the
> tablesync worker could crash or get an error just prior to dropping
> the origin, then the origin is dropped in the apply worker (this is
> explained in the comments in process_syncing_tables_for_sync()). If
> the origin has to be dropped in the apply worker, then the same
> deadlock can happen in HEAD code as well. I was able to simulate this
> by using an injection point to create an error on the tablesync worker
> and then the similar deadlock happens on HEAD as well. Attaching a
> patch for fixing this on HEAD as well.
I was able to reproduce the deadlock on HEAD as well using the
attached patch, which introduces a delay in the tablesync worker
before dropping the replication origin by adding a sleep of a few
seconds. During this delay, the apply worker also attempts to drop the
replication origin. If an ALTER SUBSCRIPTION command is executed
concurrently, a deadlock frequently occurs:
2025-07-14 15:59:53.572 IST [141100] DETAIL: Process 141100 waits for
AccessExclusiveLock on object 2 of class 6000 of database 0; blocked
by process 140974.
Process 140974 waits for AccessShareLock on object 16396 of class 6100
of database 0; blocked by process 141100.
Process 141100: alter subscription sub1 drop publication pub1
Process 140974: <command string not enabled>
After apply the attached patch, create the logical replication setup
for a publication pub1 having table t1 and then run the following
commands in a loop:
alter subscription sub1 drop publication pub1;
alter subscription sub1 add publication pub1;
sleep 4
Regards,
Vignesh
Attachment | Content-Type | Size |
---|---|---|
deadlock_simulate_add_drop_pub.patch | application/octet-stream | 1.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Daniil Davydov | 2025-07-14 10:49:10 | Re: POC: Parallel processing of indexes in autovacuum |
Previous Message | Japin Li | 2025-07-14 10:24:44 | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |