RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-03-29 07:23:11
Message-ID: OS0PR01MB5716DDE379D7D21CD7D6B6A0943A2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, March 29, 2024 2:48 PM Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote:
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > 040 tap-test so that we can analyze the possible CFbot failures easily.
> >
>
> Thanks!
>
> + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> + {
> + /*
> + * By advancing the restart_lsn, confirmed_lsn, and xmin using
> + * fast-forward logical decoding, we ensure that the required
> snapshots
> + * are saved to disk. This enables logical decoding to quickly
> reach a
> + * consistent point at the restart_lsn, eliminating the risk of
> missing
> + * data during snapshot creation.
> + */
> +
> pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> +
> found_consistent_point);
> + ReplicationSlotsComputeRequiredLSN();
> + updated_lsn = true;
> + }
>
> Instead of using pg_logical_replication_slot_advance() for each synced slot and
> during sync cycles what about?:
>
> - keep sync slot synchronization as it is currently (not using
> pg_logical_replication_slot_advance())
> - create "an hidden" logical slot if sync slot feature is on
> - at the time of promotion use pg_logical_replication_slot_advance() on this
> hidden slot only to advance to the max lsn of the synced slots
>
> I'm not sure that would be enough, just asking your thoughts on this (benefits
> would be to avoid calling pg_logical_replication_slot_advance() on each sync
> slots and during the sync cycles).

Thanks for the idea !

I considered about this. I think advancing the "hidden" slot on promotion may be a
bit late, because if we cannot reach the consistent point after advancing the
"hidden" slot, then it means we may need to remove all the synced slots as we
are not sure if they are usable(will not loss data) after promotion. And it may
confuse user a bit as they have seen these slots to be sync-ready.

The current approach is to mark such un-consistent slot as temp and persist
them once it reaches consistent point, so that user can ensure the slot can be
used after promotion once persisted.

Another optimization idea is to check the snapshot file existence before calling the
slot_advance(). If the file already exists, we skip the decoding and directly
update the restart_lsn. This way, we could also avoid some duplicate decoding
work.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-03-29 07:38:14 Re: Synchronizing slots from primary to standby
Previous Message John Naylor 2024-03-29 07:21:32 Re: [PoC] Improve dead tuple storage for lazy vacuum