From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | John H <johnhyvr(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Introduce XID age based replication slot invalidation |
Date: | 2025-09-25 00:18:42 |
Message-ID: | CALj2ACVeZb7AhzjTf+Mzu3OyA5hVyNbHzGUPTvFukMh8-Zmi5Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Thu, Sep 18, 2025 at 10:20 AM John H <johnhyvr(at)gmail(dot)com> wrote:
>
> I'd like to restart the discussion about providing an xid-based slot
> invalidation mechanism. The previous effort [1] presented an XID and
> time-based invalidation and the inactive time-based approach was
> implemented first. The latest XID based patch from Bharath Rupireddy
> can be found here [2].
>
> When thinking about availability of the database, inactive replication
> slots cause two main pain points:
> 1) WAL accumulation
> 2) Replication slots with xmin/catalog_xmin can hold back vacuuming
> leading to wrap-around
>
> It's easy to imagine a high-XID churning workload in one cluster while
> another has large batch jobs where changes get synced out
> periodically. There isn't a "one-size" fits all setting for
> 'idle_replication_slot_timeout' in these two cases.
+1.
> The attached patch addresses this by introducing 'max_slot_xid_age' in
> a similar fashion. Replication slots with transaction ID greater than
> the set age will get invalidated allowing vacuum to proceed, biasing
> towards database availability.
>
> Invalidation happens in CHECKPOINT, similar to
> 'idle_replication_slot_timeout', and when VACUUM occurs.
>
> The patch currently attempts to invalidate once-per-autovacuum worker.
> We're wondering if it should attempt invalidation on a per-relation
> basis within the vacuum call itself. That would account for scenarios
> where the cost_delay or naptime is high between autovac executions.
IMO, computing XID horizons per-relation during vacuum is good. The
main reason we try to invalidate replication slots based on the XID
age in the vacuum path is to help the database when it needs it most -
when vacuum is computing the XID horizons. That said, it would be good
to have performance analysis with a large number of replication slots,
comparing once-per-relation vs. once-per-autovacuum worker vs.
once-per-autovacuum launcher wake-up cycle.
I haven't looked at the patch in depth, but it would be good to have a
TAP test with more realistic production workloads. We could set this
value to less than 1.5 billion and use xid_wraparound test to quickly
reach the wraparound limits, then verify if this setting can help
prevent the database from reaching wraparound errors. This approach
would also validate the age calculations in
try_replication_slot_invalidation with higher limits.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-09-25 00:38:55 | Re: Add support for entry counting in pgstats |
Previous Message | Bharath Rupireddy | 2025-09-25 00:03:08 | Re: Vacuum statistics |