| From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, John H <johnhyvr(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Introduce XID age based replication slot invalidation |
| Date: | 2026-03-25 19:17:17 |
| Message-ID: | CALj2ACWcaSkfMAQu3s6ZkTZuoFvVRD=DkxXbBwC33PL9+kzsqw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Tue, Mar 24, 2026 at 11:50 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > Please find the v3 patch for further review.
>
> Thank you for updating the patch. I think the patch is reasonably
> simple and can avoid unnecessary overheads well due to XID-based
> checks. Here are some comments:
Thank you for reviewing the patch.
> vacuum_get_cutoff() is also called by VACUUM FULL, CLUSTER, and
> REPACK. I'm not sure that users would expect the slot invalidation
> also in these commands. I think it's better to leave
> vacuum_get_cutoff() a pure cutoff computation function and we can try
> to invalidate slots in heap_vacuum_rel(). It requires additional
> ReadNextTransactionId() but we can live with it, or we can make
> vacuum_get_cutoffs() return the nextXID as well (stored in *cutoffs).
+1. I chose to perform the slot invalidation in heap_vacuum_rel by
getting the next txn ID and calling vacuum_get_cutoffs again when a
slot gets invalidated. IMHO, this is simple than adding a flag and do
the invalidation selectively in vacuum_get_cutoffs.
> if (TransactionIdPrecedes(oldestXmin, cutoffXID))
> + {
> + invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
> + 0,
> + InvalidOid,
> + InvalidTransactionId,
> + nextXID);
> + }
>
> I think it's better to check the procArray->replication_slot_xmin and
> procArray->replication_slot_catalog_xmin before iterating over each
> slot. Otherwise, we would end up checking every slot even when a long
> running transaction holds the oldestxmin back.
+1. Changed.
> + if (!TransactionIdIsNormal(cutoffXID))
> + cutoffXID = FirstNormalTransactionId;
>
> These codes have the same comment but are doing a slightly different
> thing. I guess the latter is missing '-'?
Fixed the typo.
I fixed a test error being reported in CI.
Please find the attached v4 patch for further review.
I've also attached the 0002 patch that adds a test case to demo a
production-like scenario by pushing the database to XID wraparound
limits and checking if the XID-age based invalidation with the GUC
setting at the default vacuum_failsafe_age of 1.6B works correctly,
and whether autovacuum can successfully remove this replication slot
blocker to proceed with freezing and bring the database back to
normal. I don't intend to get this committed unless others think
otherwise, but I wanted to have this as a reference.
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0002-Add-more-tests-for-XID-age-slot-invalidation.patch | application/x-patch | 7.4 KB |
| v4-0001-Add-XID-age-based-replication-slot-invalidation.patch | application/x-patch | 25.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2026-03-25 19:18:00 | Re: [PATCH] Fix wrong argument to SOFT_ERROR_OCCURRED in timestamptz_date |
| Previous Message | Melanie Plageman | 2026-03-25 18:54:16 | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |