Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-04-06 06:25:38
Message-ID: CALj2ACW8SYiRL8vGAhez=FdrzxO0W5pVFvN8GF7yKCO8_c=QdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 5, 2024 at 1:14 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> > Please find the attached v36 patch.
>
> A few comments:
>
> 1 ===
>
> + <para>
> + The timeout is measured from the time since the slot has become
> + inactive (known from its
> + <structfield>inactive_since</structfield> value) until it gets
> + used (i.e., its <structfield>active</structfield> is set to true).
> + </para>
>
> That's right except when it's invalidated during the checkpoint (as the slot
> is not acquired in CheckPointReplicationSlots()).
>
> So, what about adding: "or a checkpoint occurs"? That would also explain that
> the invalidation could occur during checkpoint.

Reworded.

> 2 ===
>
> + /* If the slot has been invalidated, recalculate the resource limits */
> + if (invalidated)
> + {
>
> /If the slot/If a slot/?

Modified it to be like elsewhere.

> 3 ===
>
> + * NB - this function also runs as part of checkpoint, so avoid raising errors
>
> s/NB - this/NB: This function/? (that looks more consistent with other comments
> in the code)

Done.

> 4 ===
>
> + * Note that having a new function for RS_INVAL_INACTIVE_TIMEOUT cause instead
>
> I understand it's "the RS_INVAL_INACTIVE_TIMEOUT cause" but reading "cause instead"
> looks weird to me. Maybe it would make sense to reword this a bit.

Reworded.

> 5 ===
>
> + * considered not active as they don't actually perform logical decoding.
>
> Not sure that's 100% accurate as we switched in fast forward logical
> in 2ec005b4e2.
>
> "as they perform only fast forward logical decoding (or not at all)", maybe?

Changed it to "as they don't perform logical decoding to produce the
changes". In fast_forward mode no changes are produced.

> 6 ===
>
> + if (RecoveryInProgress() && slot->data.synced)
> + return false;
> +
> + if (replication_slot_inactive_timeout == 0)
> + return false;
>
> What about just using one if? It's more a matter of taste but it also probably
> reduces the object file size a bit for non optimized build.

Changed.

> 7 ===
>
> + /*
> + * Do not invalidate the slots which are currently being synced from
> + * the primary to the standby.
> + */
> + if (RecoveryInProgress() && slot->data.synced)
> + return false;
>
> I think we don't need this check as the exact same one is done just before.

Right. Removed.

> 8 ===
>
> +sub check_for_slot_invalidation_in_server_log
> +{
> + my ($node, $slot_name, $offset) = @_;
> + my $invalidated = 0;
> +
> + for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
> + {
> + $node->safe_psql('postgres', "CHECKPOINT");
>
> Wouldn't be better to wait for the replication_slot_inactive_timeout time before
> instead of triggering all those checkpoints? (it could be passed as an extra arg
> to wait_for_slot_invalidation()).

Done.

> 9 ===
>
> # Synced slot mustn't get invalidated on the standby, it must sync invalidation
> # from the primary. So, we must not see the slot's invalidation message in server
> # log.
> ok( !$standby1->log_contains(
> "invalidating obsolete replication slot \"lsub1_sync_slot\"",
> $standby1_logstart),
> 'check that syned slot has not been invalidated on the standby');
>
> Would that make sense to trigger a checkpoint on the standby before this test?
> I mean I think that without a checkpoint on the standby we should not see the
> invalidation in the log anyway.

Done.

Please find the attached v37 patch for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v37-0001-Add-inactive_timeout-based-replication-slot-inva.patch application/octet-stream 32.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-04-06 06:38:45 Re: LogwrtResult contended spinlock
Previous Message Amit Kapila 2024-04-06 06:19:19 Re: promotion related handling in pg_sync_replication_slots()