Re: Synchronizing slots from primary to standby

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-02-06 06:55:29
Message-ID: ZcHX4SXkqtGe27a6@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Feb 06, 2024 at 03:19:11AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Friday, February 2, 2024 2:03 PM Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On Thu, Feb 01, 2024 at 05:29:15PM +0530, shveta malik wrote:
> > > Attached v75 patch-set. Changes are:
> > >
> > > 1) Re-arranged the patches:
> > > 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are
> > > separated out in v75-001 as those are independent changes.
> > > 1.2) 'Add logical slot sync capability', 'Slot sync worker as special
> > > process' and 'App-name changes' are now merged to single patch which
> > > makes v75-002.
> > > 1.3) 'Wait for physical Standby confirmation' and 'Failover Validation
> > > Document' patches are maintained as is (v75-003 and v75-004 now).
> >
> > Thanks!
> >
> > I only looked at the commit message for v75-0002 and see that it has changed
> > since the comment done in [1], but it still does not look correct to me.
> >
> > "
> > If a logical slot on the primary is valid but is invalidated on the standby, then
> > that slot is dropped and recreated on the standby in next sync-cycle provided
> > the slot still exists on the primary server. It is okay to recreate such slots as long
> > as these are not consumable on the standby (which is the case currently). This
> > situation may occur due to the following reasons:
> > - The max_slot_wal_keep_size on the standby is insufficient to retain WAL
> > records from the restart_lsn of the slot.
> > - primary_slot_name is temporarily reset to null and the physical slot is
> > removed.
> > - The primary changes wal_level to a level lower than logical.
> > "
> >
> > If a logical decoding slot "still exists on the primary server" then the primary
> > can not change the wal_level to lower than logical, one would get something
> > like:
> >
> > "FATAL: logical replication slot "logical_slot" exists, but wal_level < logical"
> >
> > and then slots won't get invalidated on the standby. I've the feeling that the
> > wal_level conflict part may need to be explained separately? (I think it's not
> > possible that they end up being re-created on the standby for this conflict,
> > they will be simply removed as it would mean the counterpart one on the
> > primary does not exist anymore).
>
> This is possible in some extreme cases, because the slot is synced
> asynchronously.
>
> For example: If on the primary the wal_level is changed to 'replica'

It means that all the logical slots have been dropped on the primary (if not,
it's not possible to change it to a level < logical).

> and then
> changed back to 'logical', the standby would receive two XLOG_PARAMETER_CHANGE
> wals. And before the standby replay these wals, user can create a failover slot

And now it is re-created.

So the slot has been dropped and recreated on the primary, to it's kind of expected
it is also dropped and re-created on the standby (should it be invalidated or not).

> Although I think it doesn't seem a real world case, so I am not sure is it worth
> separate explanation.

Yeah, I don't think your example is worth a separate explanation also because
it's expected to see the slot being dropped / re-created anyway (see above).

That said, I still think the commit message needs some re-wording, what about?

=====
If a logical slot on the primary is valid but is invalidated on the standby,
then that slot is dropped and can be recreated on the standby in next
pg_sync_replication_slots() call provided the slot still exists on the primary
server. It is okay to recreate such slots as long as these are not consumable
on the standby (which is the case currently). This situation may occur due to
the following reasons:

- The max_slot_wal_keep_size on the standby is insufficient to retain WAL
records from the restart_lsn of the slot.
- primary_slot_name is temporarily reset to null and the physical slot is
removed.

Changing the primary wal_level to a level lower than logical is only possible
if the logical slots are removed on the primary, so it's expected to see
the slots being removed on the standby too (and re-created if they are
re-created on the primary).
=====

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-02-06 07:38:47 Re: Synchronizing slots from primary to standby
Previous Message Amit Kapila 2024-02-06 06:48:05 Re: Why is subscription/t/031_column_list.pl failing so much?