RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-02-13 01:15:24
Message-ID: OS0PR01MB57162551460DBAE9269FEF86944F2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, February 12, 2024 6:03 PM Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Sun, Feb 11, 2024 at 01:23:19PM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Saturday, February 10, 2024 9:10 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada
> > > <sawada(dot)mshk(at)gmail(dot)com>
> > > wrote:
> > > >
> > > > On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu)
> > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > > Another alternative is to register the callback when calling
> > > > > slotsync functions and unregister it after the function call.
> > > > > And register the callback in
> > > > > slotsyncworkmain() for the slotsync worker patch, although this
> > > > > may adds a few more codes.
> > > >
> > > > Another idea is that SyncReplicationSlots() calls
> > > > synchronize_slots() in PG_ENSURE_ERROR_CLEANUP() block instead of
> > > > PG_TRY(), to make sure to clear the flag in case of ERROR or
> > > > FATAL. And the slotsync worker uses the before_shmem_callback to clear
> the flag.
> > > >
> > >
> > > +1. This sounds like a better way to clear the flag.
> >
> > Agreed. Here is the V84 patch which addressed this.
> >
> > Apart from above, I removed the txn start/end codes from 0001 as they
> > are used in the slotsync worker patch. And I also ran pgindent and
> > pgperltidy for the patch.
> >
>
> Thanks!
>
> A few random comments:

Thanks for the comments.

>
> 001 ===
>
> "
> For
> the synchronization to work, it is mandatory to have a physical replication slot
> between the primary and the standby, "
>
> Maybe mention "primary_slot_name" here?

Added.

>
> 002 ===
>
> + <para>
> + Synchronize the logical failover slots from the primary server to the
> standby server.
>
> should we say "logical failover replication slots" instead?

Changed.

>
> 003 ===
>
> + If, after executing the function,
> + <link linkend="guc-hot-standby-feedback">
> + <varname>hot_standby_feedback</varname></link> is disabled
> on
> + the standby or the physical slot configured in
> + <link linkend="guc-primary-slot-name">
> + <varname>primary_slot_name</varname></link> is
> + removed,
>
> I think another option that could lead to slot invalidation is if primary_slot_name
> is NULL or miss-configured. Indeed hot_standby_feedback would be working
> (for the catalog_xmin) but only as long as the standby is up and running.

I didn't change this based on the discussion.

>
> 004 ===
>
> + on the standby. For the synchronization to work, it is mandatory to
> + have a physical replication slot between the primary and the
> + standby,
>
> should we mention primary_slot_name here?

Added.

>
> 005 ===
>
> + To resume logical replication after failover from the synced logical
> + slots, the subscription's 'conninfo' must be altered
>
> Only in a pub/sub context but not for other ways of using the logical replication
> slot(s).

I am not very sure about this, because the 3-rd part logicalrep can also
have their own replication origin, so I didn't change for now, but will think over
this.

>
> 006 ===
>
> + neither be used for logical decoding nor dropped by the user
>
> what about "nor dropped manually"?

Changed.

>
> 007 ===
>
> +typedef struct SlotSyncCtxStruct
> +{
>
> Should we remove "Struct" from the struct name?

The name was named based on some other comment to be consistent
with LogicalReplCtxStruct, so I didn't change this.
If other also prefer without struct, we can change it later.

> 008 ===
>
> + ereport(LOG,
> + errmsg("dropped replication slot
> \"%s\" of dbid %d",
> +
> NameStr(local_slot->data.name),
> +
> + local_slot->data.database));
>
> We emit a message when an "invalidated" slot is dropped but not when we
> create a slot. Shouldn't we emit a message when we create a synced slot on the
> standby?
>
> I think that could be confusing to see "a drop" message not followed by "a
> create"
> one when it's expected (slot valid on the primary for example).

I think we will report "sync-ready" for newly synced slot, for newly
created temporary slots, I am not sure do we need to report log to them,
because they will be dropped on promotion anyway. But if others also prefer to log,
I am fine with that.

>
> 009 ===
>
> Regarding 040_standby_failover_slots_sync.pl what about adding tests for?
>
> - synced slot invalidation (and ensure it's recreated once
> pg_sync_replication_slots() is called and when the slot in primary is valid)

Will try this in next version.

> - cannot enable failover for a temporary replication slot

Added.

> - replication slots can only be synchronized from a standby server

Added.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-02-13 01:15:48 RE: Synchronizing slots from primary to standby
Previous Message torikoshia 2024-02-13 00:19:03 Re: RFC: Logging plan of the running query