Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag
Date: 2023-10-09 09:37:21
Message-ID: ZSPJ0dKz4fdjFs5n@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 08, 2023 at 05:48:55PM -0400, Tom Lane wrote:
> There have been intermittent failures on various buildfarm machines
> since this went in. After seeing one on my own animal mamba [1],
> I tried to reproduce it manually on that machine, and it does
> indeed fail about one time in two. The buildfarm script is not
> managing to capture the relevant log files, but what I see in a
> manual run is that 001_worker_spi.pl logs this:

Thanks for the logs, I've noticed the failure but could not make any
sense of it based on the lack of information provided from the
buildfarm. Serinus has complained once, for instance.

> Since this only seems to happen on slow machines, I'd call it a timing
> problem or race condition. Unless you want to argue that the race
> should not happen, probably the fix is to make the test script cope
> with this worker_spi_launch() call failing. As long as we see the
> expected result from wait_for_log, we can be pretty sure the right
> thing happened.

The trick to reproduce the failure is to slow down worker_spi_launch()
before WaitForBackgroundWorkerStartup() with a worker already
registered so as the worker has the time to start and exit because of
the ALLOW_CONNECTIONS restriction. (SendPostmasterSignal() in
RegisterDynamicBackgroundWorker() interrupts a hardcoded sleep, so
I've just used an on-disk flag.)

Another thing is that we cannot rely on the PID returned by launch()
as it could fail, so $worker3_pid needs to disappear. If we do that,
I'd rather just switch to a specific database for the tests with
ALLOWCONN rather than reuse "mydb" that could have other workers. The
attached fixes the issue for me.
--
Michael

Attachment Content-Type Size
worker_spi_fix.patch text/x-diff 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-10-09 09:45:41 Crash in add_paths_to_append_rel
Previous Message Ashutosh Bapat 2023-10-09 09:32:15 Re: FDW LIM IT pushdown