From: | John H <johnhyvr(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Introduce XID age based replication slot invalidation |
Date: | 2025-09-18 17:20:22 |
Message-ID: | CA+-JvFsMHckBMzsu5Ov9HCG3AFbMh056hHy1FiXazBRtZ9pFBg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi folks,
I'd like to restart the discussion about providing an xid-based slot
invalidation mechanism. The previous effort [1] presented an XID and
time-based invalidation and the inactive time-based approach was
implemented first. The latest XID based patch from Bharath Rupireddy
can be found here [2].
When thinking about availability of the database, inactive replication
slots cause two main pain points:
1) WAL accumulation
2) Replication slots with xmin/catalog_xmin can hold back vacuuming
leading to wrap-around
The first issue can be mitigated by 'max_slot_wal_keep_size'. However
in the second case there are no good mechanisms to prioritize write
availability of the database and avoid wraparound. The new GUC
'idle_replication_slot_timeout' partially addresses the concern if you
have similar workloads. However it's hard to set the same setting
across a fleet of different applications.
It's easy to imagine a high-XID churning workload in one cluster while
another has large batch jobs where changes get synced out
periodically. There isn't a "one-size" fits all setting for
'idle_replication_slot_timeout' in these two cases.
The attached patch addresses this by introducing 'max_slot_xid_age' in
a similar fashion. Replication slots with transaction ID greater than
the set age will get invalidated allowing vacuum to proceed, biasing
towards database availability.
Invalidation happens in CHECKPOINT, similar to
'idle_replication_slot_timeout', and when VACUUM occurs.
The patch currently attempts to invalidate once-per-autovacuum worker.
We're wondering if it should attempt invalidation on a per-relation
basis within the vacuum call itself. That would account for scenarios
where the cost_delay or naptime is high between autovac executions.
Thanks,
John H
[1] https://www.postgresql.org/message-id/flat/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe%2Baw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/CALj2ACXe8%2BxSNdMXTMaSRWUwX7v61Ad4iddUwnn%3DdjSwx3GLLg%40mail.gmail.com
--
John Hsu - Amazon Web Services
Attachment | Content-Type | Size |
---|---|---|
0044-Add-XID-age-based-replication-slot-invalidation.patch | application/octet-stream | 23.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2025-09-18 17:23:11 | Re: plan shape work |
Previous Message | Jacob Champion | 2025-09-18 17:08:49 | Updating IPC::Run in CI? |