Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-11 10:14:46
Message-ID: CAA4eK1+C3QxAZ6UtpSn9umGB33YjtYJgxA_xVXCacATEfZG4YQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 8, 2024 at 10:42 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > You might want to consider its interaction with sync slots on standby.
> > Say, there is no activity on slots in terms of processing the changes
> > for slots. Now, we won't perform sync of such slots on standby showing
> > them inactive as per your new criteria where as same slots could still
> > be valid on primary as the walsender is still active. This may be more
> > of a theoretical point as in running system there will probably be
> > some activity but I think this needs some thougths.
>
> I believe the xmin and catalog_xmin of the sync slots on the standby
> keep advancing depending on the slots on the primary, no? If yes, the
> XID age based invalidation shouldn't be a problem.
>
> I believe there are no walsenders started for the sync slots on the
> standbys, right? If yes, the inactive timeout based invalidation also
> shouldn't be a problem. Because, the inactive timeouts for a slot are
> tracked only for walsenders because they are the ones that typically
> hold replication slots for longer durations and for real replication
> use. We did a similar thing in a recent commit [1].
>
> Is my understanding right?
>

Yes, your understanding is correct. I wanted us to consider having new
parameters like 'inactive_replication_slot_timeout' to be at
slot-level instead of GUC. I think this new parameter doesn't seem to
be the similar as 'max_slot_wal_keep_size' which leads to truncation
of WAL at global and then invalidates the appropriate slots. OTOH, the
'inactive_replication_slot_timeout' doesn't appear to have a similar
global effect. The other thing we should consider is what if the
checkpoint happens at a timeout greater than
'inactive_replication_slot_timeout'? Shall, we consider doing it via
some other background process or do we think checkpointer is the best
we can have?

>
Do you still see any problems with it?
>

Sorry, I haven't done any detailed review yet so can't say with
confidence whether there is any problem or not w.r.t sync slots.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-03-11 10:39:27 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Li, Yong 2024-03-11 10:01:42 Re: Proposal to add page headers to SLRU pages