Re: Introduce XID age based replication slot invalidation

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, John H <johnhyvr(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Introduce XID age based replication slot invalidation
Date: 2026-03-23 16:00:00
Message-ID: CALj2ACX_o+dKeAaK76mpAtG646UnDHpGUWziUkCvicVz8mz6=A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Mar 20, 2026 at 11:29 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram(at)gmail(dot)com> wrote:
>
> Do you think we need different GUCs for catalog_xmin and xmin? If table bloat is a concern (not catalog bloat), then logical slots are not required to invalidate unless the cluster is close to wraparound.

IMO the main purpose of max_slot_xid_age is to prevent XID wraparound.
For bloat, I still think max_slot_wal_keep_size is the better choice.

Where max_slot_xid_age is really useful is when the vacuum can't
freeze because a replication slot (physical or logical) is holding
back the XID horizon and the system is getting close to wraparound.
Invalidating such a slot clears the way for vacuum. Setting
max_slot_xid_age above vacuum_failsafe_age allows vacuum to waste
cycles scanning tables it cannot freeze. Keeping max_slot_xid_age <=
vacuum_failsafe_age (default 1.6B) prevents this by invalidating the
slot before vacuum effort is wasted.

As far as XID wraparound is concerned, both xmin and catalog_xmin need
to be treated similarly. Either one can hold back freezing and push
the system toward wraparound. So I don't think we need separate GUCs
for xmin and catalog_xmin unless I'm missing something. One GUC
covering both keeps things simple.

>> I made the following design choice: try invalidating only once per
>> vacuum cycle, not per table. While this keeps the cost of checking
>> (incl. the XidGenLock contention) for invalidation to a minimum when
>> there are a large number of tables and replication slots, it can be
>> less effective when individual tables/indexes are large. Invalidating
>> during checkpoints can help to some extent with the large table/index
>> cases. But I'm open to thoughts on this.
>
> It may not solve the intent when the vacuum cycle is longer, which one can expect on a large database particularly when there is heavy bloat.

This design choice boils down to the following: a database instance
having either 1/ a large number of small tables or 2/ large tables.
From my experience, I have seen both cases but mostly case 2 (others
can correct me). In this context, having an XID age based slot
invalidation check once per relation makes sense. However, I'm open to
more thoughts here.

>> Please find the attached patch for further review. I fixed the XID age
>> calculation in ReplicationSlotIsXIDAged and adjusted the code
>> comments.
>
> I applied the patch and all the tests passed. A few comments:

Thank you for reviewing the patch.

> @@ -495,7 +525,7 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
> MemoryContext vac_context, bool isTopLevel)
> {
> static bool in_vacuum = false;
> -
> + static bool first_time = true;
>
> first_time variable is not self explanatory, maybe something like try_replication_slot_invalidation and add comments that it will be set to false after the first check?

+1. Changed the variable name and simplified the comments around.

> + if (TransactionIdIsValid(xmin))
> + appendStringInfo(&err_detail, _("The slot's xmin %u exceeds the maximum xid age %d specified by \"max_slot_xid_age\"."),
> + xmin,
> + max_slot_xid_age);
>
> Slot invalidates even when the age is max_slot_xid_age, isn't it?

Nice catch! I changed it to use TransactionIdPrecedes so it matches
the above error message like the two of the existing XID age GUCs
(autovacuum_freeze_max_age, vacuum_failsafe_age).

Please find the attached v2 patch for further review. Thank you!

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v2-0001-Add-XID-age-based-replication-slot-invalidation.patch application/x-patch 23.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2026-03-23 16:07:24 Re: Adding REPACK [concurrently]
Previous Message Andres Freund 2026-03-23 15:50:18 Re: Bug in pg_get_aios()