Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-20 23:35:46
Message-ID: CALj2ACUvK7ShPyirAURWf62=qOQqu=NwnyL5CMfVjM5Ody7Oxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 20, 2024 at 1:04 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On Wed, Mar 20, 2024 at 08:58:05AM +0530, Amit Kapila wrote:
> > On Wed, Mar 20, 2024 at 12:49 AM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > Following are some open points:
> > >
> > > 1. Where to do inactive_timeout invalidation exactly if not the checkpointer.
> > >
> > I have suggested to do it at the time of CheckpointReplicationSlots()
> > and Bertrand suggested to do it whenever we resume using the slot. I
> > think we should follow both the suggestions.
>
> Agree. I also think that pg_get_replication_slots() would be a good place, so
> that queries would return the right invalidation status.

I've addressed review comments and attaching the v13 patches with the
following changes:

1. Invalidate replication slot due to inactive_timeout:
1.1 In CheckpointReplicationSlots() to help with automatic invalidation.
1.2 In pg_get_replication_slots to help readers see the latest slot information.
1.3 In ReplicationSlotAcquire for walsenders as typically walsenders
are the ones that use slots for longer durations for streaming
standbys and logical subscribers.
1.4 In ReplicationSlotAcquire when called from
pg_logical_slot_get_changes_guts to help with logical decoding clients
to disallow decoding from invalidated slots.
1.5 In ReplicationSlotAcquire when called from
pg_replication_slot_advance to help with disallowing advancing
invalidated slots.
2. Have a new input parameter bool check_for_invalidation for
ReplicationSlotAcquire(). When true, check for the inactive_timeout
invalidation, if invalidated, error out.
3. Have a new function to just do inactive_timeout invalidation.
4. Do not update last_inactive_at for failover slots on standby to not
invalidate failover slots on the standby.
5. In ReplicationSlotAcquire(), invalidate the slot before making it active.
6. Make last_inactive_at a shared-memory parameter as opposed to an
on-disk parameter to help not count the server downtime for inactive
time.
7. Let the failover slot on standby and pg_upgraded slots get
inactive_timeout parameter from the primary and old cluster
respectively.

Please see the attached v13 patches.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v13-0001-Track-invalidation_reason-in-pg_replication_slot.patch application/octet-stream 25.1 KB
v13-0002-Track-last_inactive_at-for-replication-slots-in-.patch application/octet-stream 6.3 KB
v13-0003-Allow-setting-inactive_timeout-for-replication-s.patch application/octet-stream 33.2 KB
v13-0004-Allow-setting-inactive_timeout-in-the-replicatio.patch application/octet-stream 17.9 KB
v13-0005-Add-inactive_timeout-option-to-subscriptions.patch application/octet-stream 62.5 KB
v13-0006-Add-inactive_timeout-based-replication-slot-inva.patch application/octet-stream 31.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-03-20 23:36:02 Re: Why is parula failing?
Previous Message Thomas Munro 2024-03-20 22:26:26 Re: Regression tests fail with musl libc because libpq.so can't be loaded