Re: Changes to error handling for background worker initialization?

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changes to error handling for background worker initialization?
Date: 2018-11-12 14:32:12
Message-ID: CAMa1XUhFap+AibpAHSkjRwN4cd9o8KYghWtG99JNofrEDzsAGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 22, 2018 at 9:36 AM Jeremy Finzel <finzelj(at)gmail(dot)com> wrote:

> Hello -
>
> I have an extension that uses background workers. I pass a database oid
> as an argument in order to launch the worker using function
> BackgroundWorkerInitializeConnectionByOid. In one of my regression tests
> that was written, I intentionally launch the worker with an invalid oid.
> In earlier PG versions the worker would successfully launch but then
> terminate asynchronously, with a message in the server log. Now, it does
> not even successfully launch but immediately errors (hence failing my
> regression tests).
>
> I have recently installed all later point releases of all versions 9.5-11,
> so I assume this is due to some code change. The behavior seems reasonable
> but I don't find any obvious release notes indicating a patch that would
> have changed this behavior. Any thoughts?
>
> Thanks,
> Jeremy
>

I still haven't determined the source of this error, but I have determined
that it must not be related to a difference in point release versions as to
background worker error handling, because I am seeing different behavior
for identical postgres version on my machine vs. others. I would
appreciate any ideas as to how this could possibly happen because I'm not
sure the right way now to build this regression test.

The test launches the background worker with an invalid database oid.

Here is what I am seeing running pg 11.1 on my system (same behavior I get
on 9.5-10 as well):

SELECT _launch(9999999::OID) AS pid;
! ERROR: could not start background process
! HINT: More details may be available in the server log.

This is what others are seeing (the worker fails asynchronously and you see
it in the server log):

SELECT _launch(9999999::OID) AS pid;
! pid
! -------
! 18022
! (1 row)

I could share the C code but it's not that interesting. It just calls
BackgroundWorkerInitializeConnectionByOid. It is essentially a duplicate
of worker_spi. Here is the relevant section:

sprintf(worker.bgw_function_name, "worker_spi_main");
snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi worker %d", i);
snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi");
worker.bgw_main_arg = Int32GetDatum(i);
/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
worker.bgw_notify_pid = MyProcPid;

if (!RegisterDynamicBackgroundWorker(&worker, &handle))
PG_RETURN_NULL();

status = WaitForBackgroundWorkerStartup(handle, &pid);

if (status == BGWH_STOPPED)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
errmsg("could not start background process"),
errhint("More details may be available in the server log.")));

So on my machine, I am getting status == BGWH_STOPPED, whereas with others,
they are not getting that behavior.

Thanks,
Jeremy

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-11-12 14:39:01 Re: DSM segment handle generation in background workers
Previous Message Alvaro Herrera 2018-11-12 14:00:51 Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation