Re: [PATCH] Fix fragile walreceiver test.

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: dbryan(dot)green(at)gmail(dot)com, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Fix fragile walreceiver test.
Date: 2025-11-05 07:55:46
Message-ID: aQsDAkt6yblQxGgM@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 05, 2025 at 03:30:30PM +0800, Xuneng Zhou wrote:
> On Wed, Nov 5, 2025 at 2:50 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> Timing issue then, the buildfarm has not been complaining on this one
>> AFAIK, there have been no recoveryCheck failures reported:
>> https://buildfarm.postgresql.org/cgi-bin/show_failures.pl

drongo has just reported one failure, so I stand corrected:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2025-11-05%2003%3A50%3A50

And one log rotation should be enough before the restart.

>> Hmm. The reason why I didn't use a PID matching check (mentioned at
>> [1]) is that this is not entirely bullet-proof. On a very slow
>> machine, one could assume that standby_1 generates some records and
>> that these are replayed by standby_2 *before* the PID of the WAL
>> receiver is retrieved. This could lead to false positives in some
>> cases, and a bunch of buildfarm members are very slow. You have a
>> point that these would unlikely happen in normal runs, so a PID
>> matching check would be relevant most of the time anyway, even if the
>> original PID has been fetched after the TLI jump has been processed in
>> standby_2. I'd rather keep the log check, TBH, bypassing it with an
>> extra rotate_logfile() before the restart of standby_2.
>
> I’ve also prepared a patch for this method.

That's exactly what I have done a couple of minutes ago, and noticed
your message before applying the fix so I've listed you are a
co-author on this one.

I have also kept the PID check after pondering a bit about it. A TLI
jump could be replayed before we grab the initial PID, but in most
cases it should be able to do its work correctly.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2025-11-05 08:00:01 Re: ubsan
Previous Message Nishant Sharma 2025-11-05 07:32:07 Re: [PATCH] Add pg_get_tablespace_ddl() function to reconstruct CREATE TABLESPACE statement