From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2024-04-05 07:43:58 |
Message-ID: | Zg+rvmqjL3DfbJdx@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Fri, Apr 05, 2024 at 11:21:43AM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 9:57 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> Please find the attached v36 patch.
Thanks!
A few comments:
1 ===
+ <para>
+ The timeout is measured from the time since the slot has become
+ inactive (known from its
+ <structfield>inactive_since</structfield> value) until it gets
+ used (i.e., its <structfield>active</structfield> is set to true).
+ </para>
That's right except when it's invalidated during the checkpoint (as the slot
is not acquired in CheckPointReplicationSlots()).
So, what about adding: "or a checkpoint occurs"? That would also explain that
the invalidation could occur during checkpoint.
2 ===
+ /* If the slot has been invalidated, recalculate the resource limits */
+ if (invalidated)
+ {
/If the slot/If a slot/?
3 ===
+ * NB - this function also runs as part of checkpoint, so avoid raising errors
s/NB - this/NB: This function/? (that looks more consistent with other comments
in the code)
4 ===
+ * Note that having a new function for RS_INVAL_INACTIVE_TIMEOUT cause instead
I understand it's "the RS_INVAL_INACTIVE_TIMEOUT cause" but reading "cause instead"
looks weird to me. Maybe it would make sense to reword this a bit.
5 ===
+ * considered not active as they don't actually perform logical decoding.
Not sure that's 100% accurate as we switched in fast forward logical
in 2ec005b4e2.
"as they perform only fast forward logical decoding (or not at all)", maybe?
6 ===
+ if (RecoveryInProgress() && slot->data.synced)
+ return false;
+
+ if (replication_slot_inactive_timeout == 0)
+ return false;
What about just using one if? It's more a matter of taste but it also probably
reduces the object file size a bit for non optimized build.
7 ===
+ /*
+ * Do not invalidate the slots which are currently being synced from
+ * the primary to the standby.
+ */
+ if (RecoveryInProgress() && slot->data.synced)
+ return false;
I think we don't need this check as the exact same one is done just before.
8 ===
+sub check_for_slot_invalidation_in_server_log
+{
+ my ($node, $slot_name, $offset) = @_;
+ my $invalidated = 0;
+
+ for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
+ {
+ $node->safe_psql('postgres', "CHECKPOINT");
Wouldn't be better to wait for the replication_slot_inactive_timeout time before
instead of triggering all those checkpoints? (it could be passed as an extra arg
to wait_for_slot_invalidation()).
9 ===
# Synced slot mustn't get invalidated on the standby, it must sync invalidation
# from the primary. So, we must not see the slot's invalidation message in server
# log.
ok( !$standby1->log_contains(
"invalidating obsolete replication slot \"lsub1_sync_slot\"",
$standby1_logstart),
'check that syned slot has not been invalidated on the standby');
Would that make sense to trigger a checkpoint on the standby before this test?
I mean I think that without a checkpoint on the standby we should not see the
invalidation in the log anyway.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2024-04-05 07:58:15 | Re: Improve eviction algorithm in ReorderBuffer |
Previous Message | Ants Aasma | 2024-04-05 07:33:27 | Re: Popcount optimization using AVX512 |