RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-01-12 03:46:00
Message-ID: OS0PR01MB5716BAADB57E4543E802BC60946F2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, January 11, 2024 11:42 PM Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:

Hi,

> On Thu, Jan 11, 2024 at 04:22:56PM +0530, Amit Kapila wrote:
> > On Tue, Jan 9, 2024 at 6:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > +static bool
> > > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot
> > > +*remote_slot)
> > > {
> > > ...
> > > + /* Slot ready for sync, so sync it. */ else {
> > > + /*
> > > + * Sanity check: With hot_standby_feedback enabled and
> > > + * invalidations handled appropriately as above, this should never
> > > + * happen.
> > > + */
> > > + if (remote_slot->restart_lsn < slot->data.restart_lsn) elog(ERROR,
> > > + "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> > > + " to remote slot's LSN(%X/%X) as synchronization"
> > > + " would move it backwards", remote_slot->name,
> > > + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> > > + LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> > > ...
> > > }
> > >
> > > I was thinking about the above code in the patch and as far as I can
> > > think this can only occur if the same name slot is re-created with
> > > prior restart_lsn after the existing slot is dropped. Normally, the
> > > newly created slot (with the same name) will have higher restart_lsn
> > > but one can mimic it by copying some older slot by using
> > > pg_copy_logical_replication_slot().
> > >
> > > I don't think as mentioned in comments even if hot_standby_feedback
> > > is temporarily set to off, the above shouldn't happen. It can only
> > > lead to invalidated slots on standby.
>
> I also think so.
>
> > >
> > > To close the above race, I could think of the following ways:
> > > 1. Drop and re-create the slot.
> > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves
> > > ahead of local_slot's LSN then we can update it; but as mentioned in
> > > your previous comment, we need to update all other fields as well.
> > > If we follow this then we probably need to have a check for
> > > catalog_xmin as well.
>
> IIUC, this would be a sync slot (so not usable until promotion) that could not be
> used anyway (invalidated), so I'll vote for drop / re-create then.

Such race can happen when user drop and re-create the same failover slot on
primary as well. For example, user dropped one failover slot and them
immediately created a new one by copying from an old slot(using
pg_copy_logical_replication_slot). Then the slotsync worker will find the
restart_lsn of this slot go backwards.

The steps:
----
SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'pgoutput', false, false, true);
SELECT 'init' FROM pg_create_logical_replication_slot('test', 'pgoutput', false, false, true);

- Advance the restart_lsn of 'test' slot
CREATE TABLE test2(a int);
INSERT INTO test2 SELECT generate_series(1,10000,1);
SELECT slot_name FROM pg_replication_slot_advance('test', pg_current_wal_lsn());

- re-create the test slot but based on the old isolation_slot.
SELECT pg_drop_replication_slot('test');
SELECT 'copy' FROM pg_copy_logical_replication_slot('isolation_slot', 'test');

Then the restart_lsn of 'test' slot will go backwards.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xiaoran Wang 2024-01-12 03:56:32 Re: Recovering from detoast-related catcache invalidations
Previous Message Amit Kapila 2024-01-12 03:35:54 Re: speed up a logical replica setup