Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, exclusion(at)gmail(dot)com
Subject: Re: Fix race condition in InvalidatePossiblyObsoleteSlot()
Date: 2024-03-05 00:42:20
Message-ID: ZeZqbHjLrc8hkIvu@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 26, 2024 at 02:01:45PM +0000, Bertrand Drouvot wrote:
> Though [1] mentioned up-thread is not pushed yet, I'm Sharing the POC patch now
> (see the attached).

I have looked at what you have here.

First, in a build where 818fefd8fd is included, this makes the test
script a lot slower. Most of the logic is quick, but we're spending
10s or so checking that catalog_xmin has advanced. Could it be
possible to make that faster?

A second issue is the failure mode when 818fefd8fd is reverted. The
test is getting stuck when we are waiting on the standby to catch up,
until a timeout decides to kick in to fail the test, and all the
previous tests pass. Could it be possible to make that more
responsive? I assume that in the failure mode we would get an
incorrect conflict_reason for injection_inactiveslot, succeeding in
checking the failure.

+ my $terminated = 0;
+ for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
+ {
+ if ($node_standby->log_contains(
+ 'terminating process .* to release replication slot \"injection_activeslot\"', $logstart))
+ {
+ $terminated = 1;
+ last;
+ }
+ usleep(100_000);
+ }
+ ok($terminated, 'terminating process holding the active slot is logged with injection point');

The LOG exists when we are sure that the startup process is waiting
in the injection point, so this loop could be replaced with something
like:
+ $node_standby->wait_for_event('startup', 'TerminateProcessHoldingSlot');
+ ok( $node_standby->log_contains('terminating process .* .. ', 'termin .. ';)

Nit: the name of the injection point should be
terminate-process-holding-slot rather than
TerminateProcessHoldingSlot, to be consistent with the other ones.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2024-03-05 01:04:14 Re: Preserve subscription OIDs during pg_upgrade
Previous Message Peter Smith 2024-03-05 00:40:27 Re: Synchronizing slots from primary to standby