Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2023-08-10 04:45:50
Message-ID: CAA4eK1+BmkT3jjUnejOqNFW_78D5PiRHNNAwLmfBSb7cj8QSrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 9, 2023 at 8:28 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thursday, August 3, 2023 7:30 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
>
> > Right. I attached the v26 as you asked.
>
> Thanks for posting the patches.
>
> While reviewing the patch, I noticed one rare case that it's possible that there
> are two table sync worker for the same table in the same time.
>
> The patch relies on LogicalRepWorkerLock to prevent concurrent access, but the
> apply worker will start a new worker after releasing the lock. So, at the point[1]
> where the lock is released and the new table sync worker has not been started,
> it seems possible that another old table sync worker will be reused for the
> same table.
>
> /* Now safe to release the LWLock */
> LWLockRelease(LogicalRepWorkerLock);
> *[1]
> /*
> * If there are free sync worker slot(s), start a new sync
> * worker for the table.
> */
> if (nsyncworkers < max_sync_workers_per_subscription)
> ...
> logicalrep_worker_launch(MyLogicalRepWorker->dbid,
>

Yeah, this is a problem. I think one idea to solve this is by
extending the lock duration till we launch the tablesync worker but we
should also consider changing this locking scheme such that there is a
better way to indicate that for a particular rel, tablesync is in
progress. Currently, the code in TablesyncWorkerMain() also acquires
the lock in exclusive mode even though the tablesync for a rel is in
progress which I guess could easily heart us for larger values of
max_logical_replication_workers. So, that could be another motivation
to think for a different locking scheme.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Luzanov 2023-08-10 04:56:12 Re: PG 16 draft release notes ready
Previous Message Michael Paquier 2023-08-10 04:33:48 Re: Incorrect handling of OOM in WAL replay leading to data loss