Re: Introduce XID age based replication slot invalidation

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, John H <johnhyvr(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Introduce XID age based replication slot invalidation
Date: 2026-03-25 19:17:17
Message-ID: CALj2ACWcaSkfMAQu3s6ZkTZuoFvVRD=DkxXbBwC33PL9+kzsqw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Mar 24, 2026 at 11:50 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > Please find the v3 patch for further review.
>
> Thank you for updating the patch. I think the patch is reasonably
> simple and can avoid unnecessary overheads well due to XID-based
> checks. Here are some comments:

Thank you for reviewing the patch.

> vacuum_get_cutoff() is also called by VACUUM FULL, CLUSTER, and
> REPACK. I'm not sure that users would expect the slot invalidation
> also in these commands. I think it's better to leave
> vacuum_get_cutoff() a pure cutoff computation function and we can try
> to invalidate slots in heap_vacuum_rel(). It requires additional
> ReadNextTransactionId() but we can live with it, or we can make
> vacuum_get_cutoffs() return the nextXID as well (stored in *cutoffs).

+1. I chose to perform the slot invalidation in heap_vacuum_rel by
getting the next txn ID and calling vacuum_get_cutoffs again when a
slot gets invalidated. IMHO, this is simple than adding a flag and do
the invalidation selectively in vacuum_get_cutoffs.

> if (TransactionIdPrecedes(oldestXmin, cutoffXID))
> + {
> + invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
> + 0,
> + InvalidOid,
> + InvalidTransactionId,
> + nextXID);
> + }
>
> I think it's better to check the procArray->replication_slot_xmin and
> procArray->replication_slot_catalog_xmin before iterating over each
> slot. Otherwise, we would end up checking every slot even when a long
> running transaction holds the oldestxmin back.

+1. Changed.

> + if (!TransactionIdIsNormal(cutoffXID))
> + cutoffXID = FirstNormalTransactionId;
>
> These codes have the same comment but are doing a slightly different
> thing. I guess the latter is missing '-'?

Fixed the typo.

I fixed a test error being reported in CI.

Please find the attached v4 patch for further review.

I've also attached the 0002 patch that adds a test case to demo a
production-like scenario by pushing the database to XID wraparound
limits and checking if the XID-age based invalidation with the GUC
setting at the default vacuum_failsafe_age of 1.6B works correctly,
and whether autovacuum can successfully remove this replication slot
blocker to proceed with freezing and bring the database back to
normal. I don't intend to get this committed unless others think
otherwise, but I wanted to have this as a reference.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v4-0002-Add-more-tests-for-XID-age-slot-invalidation.patch application/x-patch 7.4 KB
v4-0001-Add-XID-age-based-replication-slot-invalidation.patch application/x-patch 25.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2026-03-25 19:18:00 Re: [PATCH] Fix wrong argument to SOFT_ERROR_OCCURRED in timestamptz_date
Previous Message Melanie Plageman 2026-03-25 18:54:16 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)