Re: Synchronizing slots from primary to standby

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-10 11:00:36
Message-ID: 64056e35-1916-461c-a816-26e40ffde3a0@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 11/10/23 4:31 AM, shveta malik wrote:
> On Thu, Nov 9, 2023 at 9:15 PM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>> Yeah I think so, because there is a time window when one could "use" the slot
>> after the promotion and before it is removed. Producing things like:
>>
>> "
>> 2023-11-09 15:16:50.294 UTC [2580462] LOG: dropped replication slot "logical_slot2" of dbid 5 as it was not sync-ready
>> 2023-11-09 15:16:50.295 UTC [2580462] LOG: dropped replication slot "logical_slot3" of dbid 5 as it was not sync-ready
>> 2023-11-09 15:16:50.297 UTC [2580462] LOG: dropped replication slot "logical_slot4" of dbid 5 as it was not sync-ready
>> 2023-11-09 15:16:50.297 UTC [2580462] ERROR: replication slot "logical_slot5" is active for PID 2594628
>> "
>>
>> After the promotion one was able to use logical_slot5 and now we can now drop it.
>
> Yes, I was suspicious about this small window which may allow others
> to use this slot, that is why I was thinking of putting it in the
> promotion flow and thus asked that question earlier. But the slot-sync
> worker may end up creating it again in case it has not exited.

Sorry, there is a typo up-thread, I meant "After the promotion one was able to
use logical_slot5 and now we can NOT drop it.". We can not drop it because it
is in use.

> So we
> need to carefully decide at what all places we need to put 'not-in
> recovery' checks in slot-sync workers. In the previous version,
> synchronize_one_slot() had that check and it was skipping sync if
> '!RecoveryInProgress'. But I have removed that check in v32 thinking
> that the slots which the worker has already fetched from the primary,
> let them all get synced and exit after that nstead of syncing half
> and leaving rest. But now on rethinking, was the previous behaviour
> correct i.e. skip sync at that point onward where we see it is no
> longer in standby-mode while few of the slots have already been synced
> in that sync-cycle. Thoughts?
>

I think we still need to think/discuss the promotion flow. I think we would need
to have the slot sync worker shutdown during the promotion (as suggested by Amit in [1])
but before that let the sync slot worker knows it is now acting during promotion.

Something like:

- let the sync worker know it is now acting under promotion
- do what needs to be done while acting under promotion
- shutdown the sync worker

That way we would avoid any "risk" of having the sync worker doing something
we don't expect while not in recovery anymore.

Regarding "do what needs to be done while acting under promotion":

- Ensure all slots in 'r' state are synced
- drop slots that are in 'i' state

Thoughts?

[1]: https://www.postgresql.org/message-id/CAA4eK1J2Pc%3D5TOgty5u4bp--y7ZHaQx3_2eWPL%3DVPJ7A_0JF2g%40mail.gmail.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2023-11-10 11:27:14 Re: trying again to get incremental backup
Previous Message ZIMANYI Esteban 2023-11-10 10:47:42 Parallel aggregates in PG 16.1