Re: Race condition in WaitForBackgroundWorkerStartup

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: amit(dot)kapila16(at)gmail(dot)com
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Race condition in WaitForBackgroundWorkerStartup
Date: 2018-11-13 13:57:38
Message-ID: CAMa1XUhEFKLQM=ZRBQsjXinDn0bTxpz=jxnPRrnLa8oj_DfOjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 13, 2018 at 6:17 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Mon, Nov 12, 2018 at 11:55 PM Jeremy Finzel <finzelj(at)gmail(dot)com> wrote:
> >
> > I believe I found a race condition in WaitForBackgroundWorkerStartup in
> the case where it encounters an ERROR during startup. I found that
> depending on the speed of the system, it will unreliably return either
> status BGWH_STOPPED or BGWH_STARTED. But I can reliably reproduce getting
> BGWH_STOPPED by tweaking the worker_spi.c test module.
> >
>
> Yeah, I think it is possible that you get different values in such
> cases because we consider worker status as started after we have
> launched the worker. Now, if we get the error in the worker
> initialization, then the user can get any of those values. I think
> this is what is happening in your case where you are saying "ERROR
> during startup".
> Am I missing something?
>

Perhaps. What I am saying is that some machines show ERROR during startup,
and some machines don't get an error at all, return successfully, then
immediately error and die in the background, but the client is not shown
this. The behavior isn't predictable. However, I can get a predictable
ERROR to happen always if I put a short pause before
WaitForBackgroundWorkerStartup.

I'm unclear on what counts as "worker initialization". The error is
happening in the worker_spi_main function, not in the worker_spi_launch
function. So does an immediate error in worker_spi_main count as part of
the worker initialization?

Thanks!
Jeremy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-11-13 13:59:15 Re: move PartitionBoundInfo creation code
Previous Message Pavel Stehule 2018-11-13 13:35:24 proposal - plpgsql unique statement id