Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-22 12:24:43
Message-ID: CAFPTHDZO3f=4C3U6AJBoJVuLn9CYBuy6DO=dK=AT2TtVHMQxzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 22, 2024 at 7:15 PM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Fri, Mar 22, 2024 at 12:39 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > > > Please find the v14-0001 patch for now.
> >
> > Thanks!
> >
> > > LGTM. Let's wait for Bertrand to see if he has more comments on 0001
> > > and then I'll push it.
> >
> > LGTM too.
>
> Thanks. Here I'm implementing the following:
>
> 0001 Track invalidation_reason in pg_replication_slots
> 0002 Track last_inactive_at in pg_replication_slots
> 0003 Allow setting inactive_timeout for replication slots via SQL API
> 0004 Introduce new SQL funtion pg_alter_replication_slot
> 0005 Allow setting inactive_timeout in the replication command
> 0006 Add inactive_timeout based replication slot invalidation
>
> 1. Keep it last_inactive_at as a shared memory variable, but always
> set it at restart if the slot's inactive_timeout has non-zero value
> and reset it as soon as someone acquires that slot so that if the slot
> doesn't get acquired till inactive_timeout, checkpointer will
> invalidate the slot.
> 2. Ensure with pg_alter_replication_slot one could "only" alter the
> timeout property for the time being, if not that could lead to the
> subscription inconsistency.
> 3. Have some notes in the CREATE and ALTER SUBSCRIPTION docs about
> using an existing slot to leverage inactive_timeout feature.
> 4. last_inactive_at should also be set to the current time during slot
> creation because if one creates a slot and does nothing with it then
> it's the time it starts to be inactive.
> 5. We don't set last_inactive_at to GetCurrentTimestamp() for failover
> slots.
> 6. Leave the patch that added support for inactive_timeout in
> subscriptions.
>
> Please see the attached v14 patch set. No change in the attached
> v14-0001 from the previous patch.
>
>
>
Some comments:
1. In patch 0005:
In ReplicationSlotAlter():
+ lock_acquired = false;
if (MyReplicationSlot->data.failover != failover)
{
SpinLockAcquire(&MyReplicationSlot->mutex);
+ lock_acquired = true;
MyReplicationSlot->data.failover = failover;
+ }
+
+ if (MyReplicationSlot->data.inactive_timeout != inactive_timeout)
+ {
+ if (!lock_acquired)
+ {
+ SpinLockAcquire(&MyReplicationSlot->mutex);
+ lock_acquired = true;
+ }
+
+ MyReplicationSlot->data.inactive_timeout = inactive_timeout;
+ }
+
+ if (lock_acquired)
+ {
SpinLockRelease(&MyReplicationSlot->mutex);

Can't you make it shorter like below:
lock_acquired = false;

if (MyReplicationSlot->data.failover != failover ||
MyReplicationSlot->data.inactive_timeout != inactive_timeout) {
SpinLockAcquire(&MyReplicationSlot->mutex);
lock_acquired = true;
}

if (MyReplicationSlot->data.failover != failover) {
MyReplicationSlot->data.failover = failover;
}

if (MyReplicationSlot->data.inactive_timeout != inactive_timeout) {
MyReplicationSlot->data.inactive_timeout = inactive_timeout;
}

if (lock_acquired) {
SpinLockRelease(&MyReplicationSlot->mutex);
ReplicationSlotMarkDirty();
ReplicationSlotSave();
}

2. In patch 0005: why change walrcv_alter_slot option? it doesn't seem to
be used anywhere, any use case for it? If required, would the intention be
to add this as a Create Subscription option?

regards,
Ajin Cherian
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2024-03-22 12:30:09 postgres_fdw: Useless test in pgfdw_exec_cleanup_query_end()
Previous Message Etsuro Fujita 2024-03-22 12:15:45 Another WaitEventSet resource leakage in back branches