Re: Introduce XID age based replication slot invalidation

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, John H <johnhyvr(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Introduce XID age based replication slot invalidation
Date: 2026-03-26 21:49:55
Message-ID: CAD21AoBEBqQONiZxvnUYOu814yB2tRPrmX=7KqvL+f3ae7250w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 25, 2026 at 12:17 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Tue, Mar 24, 2026 at 11:50 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > > Please find the v3 patch for further review.
> >
> > Thank you for updating the patch. I think the patch is reasonably
> > simple and can avoid unnecessary overheads well due to XID-based
> > checks. Here are some comments:
>
> Thank you for reviewing the patch.
>
> > vacuum_get_cutoff() is also called by VACUUM FULL, CLUSTER, and
> > REPACK. I'm not sure that users would expect the slot invalidation
> > also in these commands. I think it's better to leave
> > vacuum_get_cutoff() a pure cutoff computation function and we can try
> > to invalidate slots in heap_vacuum_rel(). It requires additional
> > ReadNextTransactionId() but we can live with it, or we can make
> > vacuum_get_cutoffs() return the nextXID as well (stored in *cutoffs).
>
> +1. I chose to perform the slot invalidation in heap_vacuum_rel by
> getting the next txn ID and calling vacuum_get_cutoffs again when a
> slot gets invalidated. IMHO, this is simple than adding a flag and do
> the invalidation selectively in vacuum_get_cutoffs.
>
> > if (TransactionIdPrecedes(oldestXmin, cutoffXID))
> > + {
> > + invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
> > + 0,
> > + InvalidOid,
> > + InvalidTransactionId,
> > + nextXID);
> > + }
> >
> > I think it's better to check the procArray->replication_slot_xmin and
> > procArray->replication_slot_catalog_xmin before iterating over each
> > slot. Otherwise, we would end up checking every slot even when a long
> > running transaction holds the oldestxmin back.
>
> +1. Changed.
>
> > + if (!TransactionIdIsNormal(cutoffXID))
> > + cutoffXID = FirstNormalTransactionId;
> >
> > These codes have the same comment but are doing a slightly different
> > thing. I guess the latter is missing '-'?
>
> Fixed the typo.
>
> I fixed a test error being reported in CI.
>
> Please find the attached v4 patch for further review.

Thank you for updating the patch. I've reviewed the patch and have
some review comments:

+ /* translator: %s is a GUC variable name */
+ appendStringInfo(&err_detail, _("The slot's xmin
%u at next transaction ID %u exceeds the age %d specified by
\"%s\"."),
+ xmin,
+ nextXID,
+ max_slot_xid_age,
+ "max_slot_xid_age");

I think it's better to show the age of the slot's xmin instead of the
recent XID.

---
+
+ if (!TransactionIdIsNormal(oldestXmin) || !TransactionIdIsNormal(nextXID))
+ return false;
+

Do we expect that the passed oldestXmin or nextXID could be non-normal
XIDs? I think the function assumes these are valid XIDs.

Also, since this function is called only by heap_vacuum_rel(), we can
call ReadNextTransactionId() within this function.

---
+ if (IsReplicationSlotXIDAged(slot_xmin, slot_catalog_xmin, nextXID))

We compute the cutoff XID in IsReplicationSlotXIDAged() again, which
seems redundant.

I've attached the fixup patch addressing these comments and having
some code cleanups. Please review it.

I'm reviewing the regression test part, and will share review comments soon.

>
> I've also attached the 0002 patch that adds a test case to demo a
> production-like scenario by pushing the database to XID wraparound
> limits and checking if the XID-age based invalidation with the GUC
> setting at the default vacuum_failsafe_age of 1.6B works correctly,
> and whether autovacuum can successfully remove this replication slot
> blocker to proceed with freezing and bring the database back to
> normal. I don't intend to get this committed unless others think
> otherwise, but I wanted to have this as a reference.

Thank you for sharing the test script! I'll check it as well.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v4_cleanup_masahiko.patch text/x-patch 13.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2026-03-26 21:50:45 Re: doc: create table improvements
Previous Message Andres Freund 2026-03-26 21:47:54 Re: index prefetching