Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-04-05 05:51:43
Message-ID: CALj2ACXbUzp-9tixeUK_dws5k+ARJaXxj8cj9W3adA0XNUS4Hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 3, 2024 at 9:57 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> > > shouldn't the slot be dropped/recreated instead of updating inactive_since?
> >
> > The sync slots that are invalidated on the primary aren't dropped and
> > recreated on the standby.
>
> Yeah, right (I was confused with synced slots that are invalidated locally).
>
> > However, I
> > found that the synced slot is acquired and released unnecessarily
> > after the invalidation_reason is synced from the primary. I added a
> > skip check in synchronize_one_slot to skip acquiring and releasing the
> > slot if it's locally found inactive. With this, inactive_since won't
> > get updated for invalidated sync slots on the standby as we don't
> > acquire and release the slot.
>
> CR1 ===
>
> Yeah, I can see:
>
> @@ -575,6 +575,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> " name slot \"%s\" already exists on the standby",
> remote_slot->name));
>
> + /*
> + * Skip the sync if the local slot is already invalidated. We do this
> + * beforehand to save on slot acquire and release.
> + */
> + if (slot->data.invalidated != RS_INVAL_NONE)
> + return false;
>
> Thanks to the drop_local_obsolete_slots() call I think we are not missing the case
> where the slot has been invalidated on the primary, invalidation reason has been
> synced on the standby and later the slot is dropped/ recreated manually on the
> primary (then it should be dropped/recreated on the standby too).
>
> Also it seems we are not missing the case where a sync slot is invalidated
> locally due to wal removal (it should be dropped/recreated).

Right.

> > > CR5 ===
> > >
> > > + /*
> > > + * This function isn't expected to be called for inactive timeout based
> > > + * invalidation. A separate function InvalidateInactiveReplicationSlot is
> > > + * to be used for that.
> > >
> > > Do you think it's worth to explain why?
> >
> > Hm, I just wanted to point out the actual function here. I modified it
> > to something like the following, if others feel we don't need that, I
> > can remove it.
>
> Sorry If I was not clear but I meant to say "Do you think it's worth to explain
> why we decided to create a dedicated function"? (currently we "just" explain that
> we created one).

We added a new function (InvalidateInactiveReplicationSlot) to
invalidate slot based on inactive timeout because 1) we do the
inactive timeout invalidation at slot level as opposed to
InvalidateObsoleteReplicationSlots which does loop over all the slots,
2)
InvalidatePossiblyObsoleteSlot does release the lock in some cases,
has a lot of unneeded code for inactive timeout invalidation check, 3)
we want some control over saving the slot to disk because we hook the
inactive timeout invalidation into the loop that checkpoints the slot
info to the disk in CheckPointReplicationSlots.

I've added a comment atop InvalidateInactiveReplicationSlot.

Please find the attached v36 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v36-0001-Add-inactive_timeout-based-replication-slot-inva.patch application/octet-stream 32.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-04-05 06:00:00 Re: remaining sql/json patches
Previous Message Amit Kapila 2024-04-05 05:47:03 Re: Is this a problem in GenericXLogFinish()?