RE: Deadlock between logrep apply worker and tablesync worker

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Deadlock between logrep apply worker and tablesync worker
Date: 2023-01-28 04:06:26
Message-ID: OS3PR01MB5718F0BC9F89DEA388FB258494CD9@OS3PR01MB5718.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, January 27, 2023 8:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>
> On Fri, Jan 27, 2023 at 3:45 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Mon, 23 Jan 2023 at 10:52, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > IIRC, this is done to prevent concurrent drops of origin drop say by
> > > exposed API pg_replication_origin_drop(). See the discussion in [1]
> > > related to it. If we want we can optimize it so that we can acquire
> > > the lock on the specific origin as mentioned in comments
> > > replorigin_drop_by_name() but it was not clear that this operation
> > > would be frequent enough.
> >
> > Here is an attached patch to lock the replication origin record using
> > LockSharedObject instead of locking pg_replication_origin relation in
> > ExclusiveLock mode. Now tablesync worker will wait only if the
> > tablesync worker is trying to drop the same replication origin which
> > has already been dropped by the apply worker, the other tablesync
> > workers will be able to successfully drop the replication origin
> > without any wait.
> >
>
> There is a code in the function replorigin_drop_guts() that uses the
> functionality introduced by replorigin_exists(). Can we reuse this function for
> the same?

Maybe we can use SearchSysCacheExists1 to check the existence instead of
adding a new function.

One comment about the patch.

@@ -430,23 +445,21 @@ replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
...
+ /* Drop the replication origin if it has not been dropped already */
+ if (replorigin_exists(roident))
replorigin_drop_guts(rel, roident, nowait);

If developer pass missing_ok as false, should we report an ERROR here
instead of silently return ?

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-28 04:11:38 Re: Something is wrong with wal_compression
Previous Message Andrey Borodin 2023-01-28 03:57:35 Re: Something is wrong with wal_compression