Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-21 06:13:54
Message-ID: CAA4eK1KG-xAsw4CV+MiGH3yvSLG2vLU6V7Ug4w-m4DpkPwgAuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 21, 2024 at 11:23 AM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On Thu, Mar 21, 2024 at 08:47:18AM +0530, Amit Kapila wrote:
> > On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> > > >
> > > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> > > > replication slot data structure.
> > >
> > > Should last_inactive_at be tracked on disk? Say the engine is down for a period
> > > of time > inactive_timeout then the slot will be invalidated after the engine
> > > re-start (if no activity before we invalidate the slot). Should the time the
> > > engine is down be counted as "inactive" time? I've the feeling it should not, and
> > > that we should only take into account inactive time while the engine is up.
> > >
> >
> > Good point. The question is how do we achieve this without persisting
> > the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot
> > had some valid value before we shut down but it still didn't cross the
> > configured 'inactive_timeout' value, so, we won't be able to
> > invalidate it. Now, after the restart, as we don't know the
> > last_inactive_at's value before the shutdown, we will initialize it
> > with 0 (this is what Bharath seems to have done in the latest
> > v13-0002* patch). After this, even if walsender or backend never
> > acquires the slot, we won't invalidate it. OTOH, if we track
> > 'last_inactive_at' on the disk, after, restart, we could initialize it
> > to the current time if the value is non-zero. Do you have any better
> > ideas?
> >
>
> I think that setting last_inactive_at when we restart makes sense if the slot
> has been active previously. I think the idea is because it's holding xmin/catalog_xmin
> and that we don't want to prevent rows removal longer that the timeout.
>
> So what about relying on xmin/catalog_xmin instead that way?
>

That doesn't sound like a great idea because xmin/catalog_xmin values
won't tell us before restart whether it was active or not. It could
have been inactive for long time before restart but the xmin values
could still be valid. What about we always set 'last_inactive_at' at
restart (if the slot's inactive_timeout has non-zero value) and reset
it as soon as someone acquires that slot? Now, if the slot doesn't get
acquired till 'inactive_timeout', checkpointer will invalidate the
slot.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-03-21 06:16:42 Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
Previous Message Masahiko Sawada 2024-03-21 06:10:30 Re: [PoC] Improve dead tuple storage for lazy vacuum