Re: BUG #15346: Replica fails to start after the crash

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15346: Replica fails to start after the crash
Date: 2018-08-22 14:30:46
Message-ID: CAFh8B=mHh7iLGLLCjicUsJgTkz_2_=WsOpR4KvFOfo2bTX4v2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

I've figured out what is going on.
On this server we have a background worker, which starts from
shared_preload_libraries.

In order to debug and reproduce it, I removed everything from
background worker code except _PG_init, worker_main and couple of
sighandler functions.

Here is the code:

void
worker_main(Datum main_arg)
{
pqsignal(SIGHUP, bg_mon_sighup);
pqsignal(SIGTERM, bg_mon_sigterm);
if (signal(SIGPIPE, SIG_IGN) == SIG_ERR)
proc_exit(1);
BackgroundWorkerUnblockSignals();
BackgroundWorkerInitializeConnection("postgres", NULL);
while (!got_sigterm)
{
int rc = WaitLatch(MyLatch,
WL_LATCH_SET |
WL_TIMEOUT | WL_POSTMASTER_DEATH,
naptime*1000L);

ResetLatch(MyLatch);
if (rc & WL_POSTMASTER_DEATH)
proc_exit(1);

}

proc_exit(1);
}

void
_PG_init(void)
{
BackgroundWorker worker;
if (!process_shared_preload_libraries_in_progress)
return;
worker.bgw_flags = BGWORKER_SHMEM_ACCESS |
BGWORKER_BACKEND_DATABASE_CONNECTION;
worker.bgw_start_time = BgWorkerStart_ConsistentState;
worker.bgw_restart_time = 1;
worker.bgw_main = worker_main;
worker.bgw_notify_pid = 0;
snprintf(worker.bgw_name, BGW_MAXLEN, "my_worker");
RegisterBackgroundWorker(&worker);
}

Most of this code is taken from "worker_spi.c".

Basically, it just initializes connection to the postgres database and
sleeps all the time.

If I comment out the 'BackgroundWorkerInitializeConnection("postgres",
NULL);' call, postgres starts without any problem.
What is very strange, because background worker itself is not doing anything...

And one more thing, if I add sleep(15) before calling
BackgroundWorkerInitializeConnection, postgres manages to start
successfully.
Is there a very strange race condition here?

Regards,
--
Alexander Kukushkin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2018-08-22 14:44:40 Re: BUG #15346: Replica fails to start after the crash
Previous Message David Steele 2018-08-22 12:35:33 Re: BUG #15335: Documentation is wrong about archive_command and existing files

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-08-22 14:36:00 Re: JIT compiling with LLVM v12
Previous Message Andres Freund 2018-08-22 14:13:24 Re: patch to allow disable of WAL recycling