Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-04-05 04:13:35
Message-ID: CAJpy0uDNdnZHh0WXQC4xH=3zaV3ePxAu3-1OqW14jgqeuE=Okg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> > On Thu, Apr 4, 2024 at 2:59 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > >
> > > Prior to commit 2ec005b, this check was okay, as we did not expect
> > > restart_lsn of the synced slot to be ahead of remote since we were
> > > directly copying the lsns. But now when we use 'advance' to do logical
> > > decoding on standby, there is a possibility that restart lsn of the
> > > synced slot is ahead of remote slot, if there are running txns records
> > > found after reaching consistent-point while consuming WALs from
> > > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance
> > > may end up serializing snapshots and setting restart_lsn to the
> > > serialized snapshot point, ahead of remote one.
> > >
> > > Fix:
> > > The sanity check needs to be corrected. Attached a patch to address the issue.
> >
>
> Thanks for reporting, explaining the issue and providing a patch.
>
> Regarding the patch:
>
> 1 ===
>
> + * Attempt to sync lsns and xmins only if remote slot is ahead of local
>
> s/lsns/LSNs/?
>
> 2 ===
>
> + if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
> + elog(LOG,
> + "could not synchronize local slot \"%s\" LSN(%X/%X)"
> + " to remote slot's LSN(%X/%X) ",
> + remote_slot->name,
> + LSN_FORMAT_ARGS(slot->data.confirmed_flush),
> + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn));
>
> I don't think that the message is correct here. Unless I am missing something
> there is nothing in the following code path that would prevent the slot to be
> sync during this cycle.

This is a sanity check, I will put a comment to indicate the same. We
want to ensure if anything changes in future, we get correct logs to
indicate that.
If required, the LOG msg can be changed. Kindly suggest if you have
anything better in mind.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-04-05 04:14:53 Re: Streaming read-ready sequential scan code
Previous Message Erik Wienhold 2024-04-05 04:10:14 Re: IPC::Run::time[r|out] vs our TAP tests