Re: Deadlock between logrep apply worker and tablesync worker

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deadlock between logrep apply worker and tablesync worker
Date: 2023-01-28 05:56:08
Message-ID: CAA4eK1Lf3QAq1NmmhkX1OYjT8HLtpaz0XmX=+CLdpipyEYgwYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 28, 2023 at 9:36 AM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Friday, January 27, 2023 8:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> >
> > On Fri, Jan 27, 2023 at 3:45 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Mon, 23 Jan 2023 at 10:52, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > > >
> > > > IIRC, this is done to prevent concurrent drops of origin drop say by
> > > > exposed API pg_replication_origin_drop(). See the discussion in [1]
> > > > related to it. If we want we can optimize it so that we can acquire
> > > > the lock on the specific origin as mentioned in comments
> > > > replorigin_drop_by_name() but it was not clear that this operation
> > > > would be frequent enough.
> > >
> > > Here is an attached patch to lock the replication origin record using
> > > LockSharedObject instead of locking pg_replication_origin relation in
> > > ExclusiveLock mode. Now tablesync worker will wait only if the
> > > tablesync worker is trying to drop the same replication origin which
> > > has already been dropped by the apply worker, the other tablesync
> > > workers will be able to successfully drop the replication origin
> > > without any wait.
> > >
> >
> > There is a code in the function replorigin_drop_guts() that uses the
> > functionality introduced by replorigin_exists(). Can we reuse this function for
> > the same?
>
> Maybe we can use SearchSysCacheExists1 to check the existence instead of
> adding a new function.
>

Yeah, I think that would be better.

> One comment about the patch.
>
> @@ -430,23 +445,21 @@ replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
> ...
> + /* Drop the replication origin if it has not been dropped already */
> + if (replorigin_exists(roident))
> replorigin_drop_guts(rel, roident, nowait);
>
> If developer pass missing_ok as false, should we report an ERROR here
> instead of silently return ?
>

One thing that looks a bit odd is that we will anyway have a similar
check in replorigin_drop_guts() which is a static function and called
from only one place, so, will it be required to check at both places?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-01-28 06:27:29 Re: recovery modules
Previous Message Thomas Munro 2023-01-28 05:34:20 Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)