RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2023-08-09 02:58:03
Message-ID: OS0PR01MB5716C00FD3B8A0EFA51800909412A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, August 3, 2023 7:30 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:

> Right. I attached the v26 as you asked. 

Thanks for posting the patches.
 
While reviewing the patch, I noticed one rare case that it's possible that there
are two table sync worker for the same table in the same time.

The patch relies on LogicalRepWorkerLock to prevent concurrent access, but the
apply worker will start a new worker after releasing the lock. So, at the point[1]
where the lock is released and the new table sync worker has not been started,
it seems possible that another old table sync worker will be reused for the
same table.

/* Now safe to release the LWLock */
LWLockRelease(LogicalRepWorkerLock);
*[1]
/*
* If there are free sync worker slot(s), start a new sync
* worker for the table.
*/
if (nsyncworkers < max_sync_workers_per_subscription)
...
logicalrep_worker_launch(MyLogicalRepWorker->dbid,

I can reproduce it by using gdb.

Steps:
1. set max_sync_workers_per_subscription to 1 and setup pub/sub which publishes
two tables(table A and B).
2. when the table sync worker for the table A started, use gdb to block it
before being reused for another table.
3. set max_sync_workers_per_subscription to 2 and use gdb to block the apply
worker at the point after releasing the LogicalRepWorkerLock and before
starting another table sync worker for table B.
4. release the blocked table sync worker, then we can see the table sync worker
is also reused for table B.
5. release the apply worker, then we can see the apply worker will start
another table sync worker for the same table(B).

I think it would be better to prevent this case from happening as this case
will give some unexpected ERROR or LOG. Note that I haven't checked if it would
cause worse problems like duplicate copy or others.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-08-09 02:59:55 Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?
Previous Message Peter Smith 2023-08-09 02:50:10 Re: Handle infinite recursion in logical replication setup