Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-26 06:06:12
Message-ID: ZgJl1MtzlxxXeBi5@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Mar 26, 2024 at 05:55:11AM +0000, Bertrand Drouvot wrote:
> Hi,
>
> On Tue, Mar 26, 2024 at 09:30:32AM +0530, shveta malik wrote:
> > On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > I have one concern, for synced slots on standby, how do we disallow
> > > invalidation due to inactive-timeout immediately after promotion?
> > >
> > > For synced slots, last_inactive_time and inactive_timeout are both
> > > set. Let's say I bring down primary for promotion of standby and then
> > > promote standby, there are chances that it may end up invalidating
> > > synced slots (considering standby is not brought down during promotion
> > > and thus inactive_timeout may already be past 'last_inactive_time').
> > >
> >
> > On standby, if we decide to maintain valid last_inactive_time for
> > synced slots, then invalidation is correctly restricted in
> > InvalidateSlotForInactiveTimeout() for synced slots using the check:
> >
> > if (RecoveryInProgress() && slot->data.synced)
> > return false;
>
> Right.
>
> > But immediately after promotion, we can not rely on the above check
> > and thus possibility of synced slots invalidation is there. To
> > maintain consistent behavior regarding the setting of
> > last_inactive_time for synced slots, similar to user slots, one
> > potential solution to prevent this invalidation issue is to update the
> > last_inactive_time of all synced slots within the ShutDownSlotSync()
> > function during FinishWalRecovery(). This approach ensures that
> > promotion doesn't immediately invalidate slots, and henceforth, we
> > possess a correct last_inactive_time as a basis for invalidation going
> > forward. This will be equivalent to updating last_inactive_time during
> > restart (but without actual restart during promotion).
> > The plus point of maintaining last_inactive_time for synced slots
> > could be, this can provide data to the user on when last time the sync
> > was attempted on that particular slot by background slot sync worker
> > or SQl function. Thoughts?
>
> Yeah, another plus point is that if the primary is down then one could look
> at the synced "active_since" on the standby to get an idea of it (depends of the
> last sync though).
>
> The issue that I can see with your proposal is: what if one synced the slots
> manually (with pg_sync_replication_slots()) but does not use the sync worker?
> Then I think ShutDownSlotSync() is not going to help in that case.

It looks like ShutDownSlotSync() is always called (even if sync_replication_slots = off),
so that sounds ok to me (I should have checked the code, I was under the impression
ShutDownSlotSync() was not called if sync_replication_slots = off).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-03-26 06:20:45 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Bharath Rupireddy 2024-03-26 06:04:11 Re: Add new error_action COPY ON_ERROR "log"