Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-21 06:45:28
Message-ID: ZfvXiPlz/lackQlp@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Mar 21, 2024 at 11:43:54AM +0530, Amit Kapila wrote:
> On Thu, Mar 21, 2024 at 11:23 AM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 21, 2024 at 08:47:18AM +0530, Amit Kapila wrote:
> > > On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot
> > > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> > > > >
> > > > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> > > > > replication slot data structure.
> > > >
> > > > Should last_inactive_at be tracked on disk? Say the engine is down for a period
> > > > of time > inactive_timeout then the slot will be invalidated after the engine
> > > > re-start (if no activity before we invalidate the slot). Should the time the
> > > > engine is down be counted as "inactive" time? I've the feeling it should not, and
> > > > that we should only take into account inactive time while the engine is up.
> > > >
> > >
> > > Good point. The question is how do we achieve this without persisting
> > > the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot
> > > had some valid value before we shut down but it still didn't cross the
> > > configured 'inactive_timeout' value, so, we won't be able to
> > > invalidate it. Now, after the restart, as we don't know the
> > > last_inactive_at's value before the shutdown, we will initialize it
> > > with 0 (this is what Bharath seems to have done in the latest
> > > v13-0002* patch). After this, even if walsender or backend never
> > > acquires the slot, we won't invalidate it. OTOH, if we track
> > > 'last_inactive_at' on the disk, after, restart, we could initialize it
> > > to the current time if the value is non-zero. Do you have any better
> > > ideas?
> > >
> >
> > I think that setting last_inactive_at when we restart makes sense if the slot
> > has been active previously. I think the idea is because it's holding xmin/catalog_xmin
> > and that we don't want to prevent rows removal longer that the timeout.
> >
> > So what about relying on xmin/catalog_xmin instead that way?
> >
>
> That doesn't sound like a great idea because xmin/catalog_xmin values
> won't tell us before restart whether it was active or not. It could
> have been inactive for long time before restart but the xmin values
> could still be valid.

Right, the idea here was more like "don't hold xmin/catalog_xmin" for longer
than timeout.

My concern was that we set catalog_xmin at logical slot creation time. So if we
set last_inactive_at to zero at creation time and the slot is not used for a long
period of time > timeout, then I think it's not helping there.

> What about we always set 'last_inactive_at' at
> restart (if the slot's inactive_timeout has non-zero value) and reset
> it as soon as someone acquires that slot? Now, if the slot doesn't get
> acquired till 'inactive_timeout', checkpointer will invalidate the
> slot.

Yeah that sounds good to me, but I think we should set last_inactive_at at creation
time too, if not:

- physical slot could remain valid for long time after creation (which is fine)
but the behavior would change at restart.
- logical slot would have the "issue" reported above (holding catalog_xmin).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-03-21 06:50:12 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Andres Freund 2024-03-21 06:39:53 Re: Recent 027_streaming_regress.pl hangs