Re: Synchronizing slots from primary to standby

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-01-12 06:37:27
Message-ID: ZaDeJ65cAOkuxQsC@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Jan 12, 2024 at 08:42:39AM +0530, Amit Kapila wrote:
> On Thu, Jan 11, 2024 at 9:11 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > On Thu, Jan 11, 2024 at 04:22:56PM +0530, Amit Kapila wrote:
> > > >
> > > > To close the above race, I could think of the following ways:
> > > > 1. Drop and re-create the slot.
> > > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves
> > > > ahead of local_slot's LSN then we can update it; but as mentioned in
> > > > your previous comment, we need to update all other fields as well. If
> > > > we follow this then we probably need to have a check for catalog_xmin
> > > > as well.
> >
> > IIUC, this would be a sync slot (so not usable until promotion) that could
> > not be used anyway (invalidated), so I'll vote for drop / re-create then.
> >
>
> No, it can happen for non-sync slots as well.

Yeah, I meant that we could decide to drop/re-create only for sync slots.

>
> > > > Now, related to this the other case which needs some handling is what
> > > > if the remote_slot's restart_lsn is greater than local_slot's
> > > > restart_lsn but it is a re-created slot with the same name. In that
> > > > case, I think the other properties like 'two_phase', 'plugin' could be
> > > > different. So, is simply copying those sufficient or do we need to do
> > > > something else as well?
> > > >
> > >
> >
> > I'm not sure to follow here. If the remote slot is re-created then it would
> > be also dropped / re-created locally, or am I missing something?
> >
>
> As our slot-syncing mechanism is asynchronous (from time to time we
> check the slot information on primary), isn't it possible that the
> same name slot is dropped and recreated between slot-sync worker's
> checks?
>

Yeah, I should have thought harder ;-) So for this case, let's imagine that If we
had an easy way to detect that a remote slot has been drop/re-created then I think
we would also drop and re-create it on the standby too.

If so, I think we should then update all the fields (that we're currently updating
in the "create locally" case) when we detect that (at least) one of the following differs:

- dboid
- plugin
- two_phase

Maybe the "best" approach would be to have a way to detect that a slot has been
re-created on the primary (but that would mean rely on more than the slot name
to "identify" a slot and probably add a new member to the struct to do so).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2024-01-12 06:40:41 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Bharath Rupireddy 2024-01-12 06:32:39 Re: Make NUM_XLOGINSERT_LOCKS configurable