Re: Synchronizing slots from primary to standby

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-09 15:45:57
Message-ID: 8a06a7d0-b555-43b0-b407-99a618b30ece@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 11/9/23 11:54 AM, shveta malik wrote:
>
> PFA v32 patches which has below changes:

Thanks!

> 7) Added warning for cases where a user-slot with the same name is
> already present which slot-sync worker is trying to create. Sync for
> such slots is skipped.

I'm seeing assertion and segfault in this case due to ReplicationSlotRelease()
in synchronize_one_slot().

Adding this extra check prior to it:

- ReplicationSlotRelease();
+ if (!(found && s->data.sync_state == SYNCSLOT_STATE_NONE))
+ ReplicationSlotRelease();

make them disappear.

>
> Open Question:
> 1) Currently I have put drop slot logic for slots with 'sync_state=i'
> in slot-sync worker. Do we need to put it somewhere in promotion-logic
> as well?

Yeah I think so, because there is a time window when one could "use" the slot
after the promotion and before it is removed. Producing things like:

"
2023-11-09 15:16:50.294 UTC [2580462] LOG: dropped replication slot "logical_slot2" of dbid 5 as it was not sync-ready
2023-11-09 15:16:50.295 UTC [2580462] LOG: dropped replication slot "logical_slot3" of dbid 5 as it was not sync-ready
2023-11-09 15:16:50.297 UTC [2580462] LOG: dropped replication slot "logical_slot4" of dbid 5 as it was not sync-ready
2023-11-09 15:16:50.297 UTC [2580462] ERROR: replication slot "logical_slot5" is active for PID 2594628
"

After the promotion one was able to use logical_slot5 and now we can now drop it.

> Perhaps in WaitForWALToBecomeAvailable() where we call
> XLogShutdownWalRcv after checking 'CheckForStandbyTrigger'. Thoughts?
>

You mean here?

/*
* Check to see if promotion is requested. Note that we do
* this only after failure, so when you promote, we still
* finish replaying as much as we can from archive and
* pg_wal before failover.
*/
if (StandbyMode && CheckForStandbyTrigger())
{
XLogShutdownWalRcv();
return XLREAD_FAIL;
}

If so, that sounds like a good place to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2023-11-09 15:59:20 Re: Bug: RLS policy FOR SELECT is used to check new rows
Previous Message Tristan Partin 2023-11-09 15:42:17 Re: Failure during Building Postgres in Windows with Meson