Re: pgsql: Track last_inactive_time in pg_replication_slots.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <akapila(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: Track last_inactive_time in pg_replication_slots.
Date: 2024-03-26 10:14:29
Message-ID: CAA4eK1+SLfHrS_haOvD5oXk6cwD9Szh8BYR5MpPRJaBDMaELXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Tue, Mar 26, 2024 at 2:11 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2024-Mar-26, Amit Kapila wrote:
>
> > On Tue, Mar 26, 2024 at 1:09 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > > On 2024-Mar-26, Amit Kapila wrote:
> > > > I would also like to solicit your opinion on the other slot-level
> > > > parameter we are planning to introduce. This new slot-level parameter
> > > > will be named as inactive_timeout.
> > >
> > > Maybe inactivity_timeout?
> > >
> > > > This will indicate that once the slot is inactive for the
> > > > inactive_timeout period, we will invalidate the slot. We are also
> > > > discussing to have this parameter (inactive_timeout) as GUC [1]. We
> > > > can have this new parameter both at the slot level and as well as a
> > > > GUC, or just one of those.
> > >
> > > replication_slot_inactivity_timeout?
> >
> > So, it seems you are okay to have this parameter both at slot level
> > and as a GUC.
>
> Well, I think a GUC is good to have regardless of the slot parameter,
> because the GUC can be used as an instance-wide protection against going
> out of disk space because of broken replication. However, now that I
> think about it, I'm not really sure about invalidating a slot based on
> time rather on disk space, for which we already have a parameter; what's
> your rationale for that? The passage of time is not a very good
> measure, really, because the amount of WAL being protected has wildly
> varying production rate across time.
>

The inactive slot not only blocks WAL from being removed but prevents
the vacuum from proceeding. Also, there is a risk of transaction Id
wraparound. See email [1] for more context.

> I can only see a timeout being useful as a parameter if its default
> value is not the special disable value; say, the default timeout is 3
> days (to be more precise -- the period from Friday to Monday, that is,
> between DBA leaving the office one week until discovering a problem when
> he returns early next week). This way we have a built-in mechanism that
> invalidates slots regardless of how big the WAL partition is.
>

We can have a default value for this parameter but it has the
potential to break the replication, so not sure what could be a good
default value.

>
> I'm less sure about the slot parameter; in what situation do you need to
> extend the life of one individual slot further than the life of all the
> other slots?

I was thinking of an idle slot scenario where a slot from one
particular subscriber (or output plugin) is inactive due to some
maintenance activity. But it should be okay to have a GUC for this for
now.

[1] - https://www.postgresql.org/message-id/20240325195443.GA2923888%40nathanxps13

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2024-03-26 10:18:57 Re: pgsql: Allow using syncfs() in frontend utilities.
Previous Message Andrew Dunstan 2024-03-26 10:10:22 Re: pgsql: make dist uses git archive

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-03-26 10:15:22 Re: Improve readability by using designated initializers when possible
Previous Message shveta malik 2024-03-26 09:47:36 Re: Introduce XID age and inactive timeout based replication slot invalidation