Re: Introduce XID age based replication slot invalidation

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, John H <johnhyvr(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Introduce XID age based replication slot invalidation
Date: 2026-03-31 17:20:56
Message-ID: CALj2ACVGpVHuqchPPFWdiLDN-PDPCEe=sU43YB7nqafE+VMXaQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Mar 30, 2026 at 5:13 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've reviewed the v6 patch. Here are some comments.

Thank you for reviewing the patch.

> bool
> vacuum_get_cutoffs(Relation rel, const VacuumParams params,
> - struct VacuumCutoffs *cutoffs)
> + struct VacuumCutoffs *cutoffs,
> + TransactionId *slot_xmin,
> + TransactionId *slot_catalog_xmin)
>
> How about storing both slot_xmin and catalog_xmin into VacuumCutoffs?

Done.

> ---
> - if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED |
> RS_INVAL_IDLE_TIMEOUT,
> + possibleInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT |
> + RS_INVAL_XID_AGE;
> +
> + if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
> _logSegNo, InvalidOid,
> + InvalidTransactionId,
> + max_slot_xid_age > 0 ?
> + ReadNextTransactionId() :
> InvalidTransactionId))
>
> It's odd to me that we specify RS_INVAL_XID_AGE while passing
> InvalidTransactionId. I think we can specify RS_INVAL_XID_AGE along
> with a valid recentXId only when we'd like to check the slots based on
> their XIDs.

Done.

> ---
> + /* Check if the slot needs to be invalidated due to max_slot_xid_age GUC */
> + if ((possible_causes & RS_INVAL_XID_AGE) && CanInvalidateXidAgedSlot(s))
> + {
> + TransactionId xidLimit;
> +
> + Assert(TransactionIdIsValid(recentXid));
> +
> + xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
> +
>
> I think we can avoid calculating xidLimit for every slot by
> calculating it in InvalidatePossiblyObsoleteSlot() and passing it to
> DetermineSlotInvalidationCause().

Done.

> ---
> */
> TransactionId
> GetOldestNonRemovableTransactionId(Relation rel)
> +{
> + return GetOldestNonRemovableTransactionIdExt(rel, NULL, NULL);
> +}
> +
> +/*
> + * Same as GetOldestNonRemovableTransactionId(), but also returns the
> + * replication slot xmin and catalog_xmin from the same ComputeXidHorizons()
> + * call. This avoids a separate ProcArrayLock acquisition when the caller
> + * needs both values.
> + */
> +TransactionId
> +GetOldestNonRemovableTransactionIdExt(Relation rel,
> + TransactionId *slot_xmin,
> + TransactionId *slot_catalog_xmin)
> {
>
> I understand that the primary reason why the patch introduces another
> variant of GetOldestNonRemovableTransactionId() is to avoid extra
> ProcArrayLock acquision to get replication slot xmin and catalog_xmin.
> While it's not very elegant, I find that it would not be bad because
> otherwise autovacuum takes extra ProcArrayLock (in shared mode) for
> every table to vacuum. The ProcArrayLock is already known
> high-contented lock it would be better to avoid taking it once more.
> If others think differently, we can just call
> ProcArrayGetReplicationSlotXmin() separately and compare them to the
> limit of XID-age based slot invalidation.

I understand the concerns around the ProcArrayLock and I think a new
function to return the computed slot's xmin and catalog_xmin is good.

> Having said that, I personally don't want to add new instructions to
> the existing GetOldestNonRemovableTransactionId(). I guess we might
> want to make both the existing function and new function call a common
> (inline) function that takes ComputeXidHorizonsResult and returns
> appropriate transaction id based on the given relation .

Done.

> ---
> + # Do some work to advance xids
> + $node->safe_psql(
> + 'postgres', qq[
> + do \$\$
> + begin
> + for i in 1..$nxids loop
> + -- use an exception block so that each iteration eats an XID
> + begin
> + insert into $table_name values (i);
> + exception
> + when division_by_zero then null;
> + end;
> + end loop;
> + end\$\$;
> + ]);
>
> I think it's fater to use pg_current_xact_id() instead.

Done. I pulled this from an existing test case in 001_stream_rep.pl.
Used the pg_current_xact_id approach. Testing times stay the same i.e.
9 wallclock secs.

> ---
> + else
> + {
> + $node->safe_psql('postgres', "VACUUM");
> + }
>
> We don't need to vacuum all tables here.

Fixed.

> ---
> +# Configure primary with XID age settings. Set autovacuum_naptime high so
> +# that the checkpointer (not vacuum) triggers the invalidation.
> +my $max_slot_xid_age = 500;
> +$primary5->append_conf(
> + 'postgresql.conf', qq{
> +max_slot_xid_age = $max_slot_xid_age
> +autovacuum_naptime = '1h'
> +});
>
> I think that it's better to disable autovacuum than setting a large number.

Done.

> ---
> +# Testcase end: Invalidate streaming standby's slot due to max_slot_xid_age
> +# GUC (via checkpoint).
>
> I think that we can say "physical slot" instead of standby's slot to
> avoid confusion as I thought standby's slot is a slot created on the
> standby at the first glance.

Fixed.

> ---
> Do we have tests for invalidating slots on the standbys?

Added a test case for this.

Please find the attached v7 patches for further review. Thank you!

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v7-0001-Add-XID-age-based-replication-slot-invalidation.patch application/x-patch 31.7 KB
v7-0002-Add-more-tests-for-XID-age-slot-invalidation.patch application/x-patch 6.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2026-03-31 17:22:54 Re: Improve pgindent's formatting named fields in struct literals and varidic functions
Previous Message Alexander Lakhin 2026-03-31 17:00:00 Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE