Re: Synchronizing slots from primary to standby

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-02-12 14:06:03
Message-ID: Zcoly3C/pkUyC7up@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Feb 12, 2024 at 04:19:33PM +0530, Amit Kapila wrote:
> On Mon, Feb 12, 2024 at 3:33 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > A few random comments:
> >
> >
> > 003 ===
> >
> > + If, after executing the function,
> > + <link linkend="guc-hot-standby-feedback">
> > + <varname>hot_standby_feedback</varname></link> is disabled on
> > + the standby or the physical slot configured in
> > + <link linkend="guc-primary-slot-name">
> > + <varname>primary_slot_name</varname></link> is
> > + removed,
> >
> > I think another option that could lead to slot invalidation is if primary_slot_name
> > is NULL or miss-configured.
> >
>
> If the primary_slot_name is NULL then the function will error out.

Yeah right, it had to be non NULL initially so we know there is a physical slot (if
not dropped) that should prevent conflicts at the first place (should hsf be on).
Please forget about comment 003 then.

> >
> > 005 ===
> >
> > + To resume logical replication after failover from the synced logical
> > + slots, the subscription's 'conninfo' must be altered
> >
> > Only in a pub/sub context but not for other ways of using the logical replication
> > slot(s).
> >
>
> Right, but what additional information do you want here? I thought we
> were speaking about the in-build logical replication here so this is
> okay.

The "Logical Decoding Concepts" sub-chapter also mentions "Logical decoding clients"
so I was not sure the part added in the patch was for in-build logical replication
only.

Or maybe just reword that way "In case of in-build logical replication, to resume
after failover from the synced......"?

>
> >
> > 008 ===
> >
> > + ereport(LOG,
> > + errmsg("dropped replication slot \"%s\" of dbid %d",
> > + NameStr(local_slot->data.name),
> > + local_slot->data.database));
> >
> > We emit a message when an "invalidated" slot is dropped but not when we create
> > a slot. Shouldn't we emit a message when we create a synced slot on the standby?
> >
> > I think that could be confusing to see "a drop" message not followed by "a create"
> > one when it's expected (slot valid on the primary for example).
> >
>
> Isn't the below message for sync-ready slot sufficient? Otherwise, in
> most cases, we will LOG multiple similar messages.
>
> + ereport(LOG,
> + errmsg("newly created slot \"%s\" is sync-ready now",
> + remote_slot->name));

Yes it is sufficient if we reach it. For example during some test, I was able to
go through this code path:

Breakpoint 2, update_and_persist_local_synced_slot (remote_slot=0x56450e7c49c0, remote_dbid=5) at slotsync.c:340
340 ReplicationSlot *slot = MyReplicationSlot;
(gdb) n
346 if (remote_slot->restart_lsn < slot->data.restart_lsn ||
(gdb)
347 TransactionIdPrecedes(remote_slot->catalog_xmin,
(gdb)
346 if (remote_slot->restart_lsn < slot->data.restart_lsn ||
(gdb)
358 return;

means exiting from update_and_persist_local_synced_slot() without reaching the
"newly created slot" message (the slot on the primary was "inactive").

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Maiquel Grassi 2024-02-12 14:16:00 RE: Psql meta-command conninfo+
Previous Message Dave Cramer 2024-02-12 13:51:45 Re: [PATCH] Add native windows on arm64 support