Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag
Date: 2023-10-16 02:47:22
Message-ID: 2186072.1697424442@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Paquier <michael(at)paquier(dot)xyz> writes:
> The buildfarm has provided some feedback, and the new tests have been
> unstable on mamba:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2023-10-15%2005%3A04%3A21

Yeah. FWIW, I tried to reproduce this on mamba's host, but did not
see it again in 100 or so tries.

> In more details:
> # poll_query_until timed out executing this query:
> # SELECT datname, usename, wait_event FROM pg_stat_activity
> # WHERE backend_type = 'worker_spi dynamic' AND
> # pid = ;
> # with stderr:
> # ERROR: syntax error at or near ";"
> # LINE 3: pid = ;

> So this looks like a hard failure in starting the worker that should
> bypass the role login check. The logs don't offer much information,
> but I think I know what's going on here: at this stage of the tests,
> the number of workers created is 7, very close to the limit of
> max_worker_processes, at 8 by default. So one parallel worker spawned
> by any of the other bgworkers would be enough to prevent the last one
> to start, and mamba has been slow enough in the startup of the static
> workers to show that this could be possible.

I agree that that probably is the root cause, and we should fix it
by bumping up max_worker_processes in this test.

But this failure is annoying in another way. Evidently,
worker_spi_launch returned NULL, which must have been from here:

if (!RegisterDynamicBackgroundWorker(&worker, &handle))
PG_RETURN_NULL();

Why in the world is that "return NULL", and not "ereport(ERROR)"
like all the other failure conditions in that function?

If there's actually a defensible reason for the C code to act
like that, then all the call sites need to have checks for
a null result.

(cc'ing Robert, as this coding originated at 090d0f205.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-10-16 02:53:56 Re: Add support for AT LOCAL
Previous Message Michael Paquier 2023-10-16 02:22:38 Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag