From: | Alexander Kukushkin <cyberdemn(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15346: Replica fails to start after the crash |
Date: | 2018-08-22 14:30:46 |
Message-ID: | CAFh8B=mHh7iLGLLCjicUsJgTkz_2_=WsOpR4KvFOfo2bTX4v2g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Hi,
I've figured out what is going on.
On this server we have a background worker, which starts from
shared_preload_libraries.
In order to debug and reproduce it, I removed everything from
background worker code except _PG_init, worker_main and couple of
sighandler functions.
Here is the code:
void
worker_main(Datum main_arg)
{
pqsignal(SIGHUP, bg_mon_sighup);
pqsignal(SIGTERM, bg_mon_sigterm);
if (signal(SIGPIPE, SIG_IGN) == SIG_ERR)
proc_exit(1);
BackgroundWorkerUnblockSignals();
BackgroundWorkerInitializeConnection("postgres", NULL);
while (!got_sigterm)
{
int rc = WaitLatch(MyLatch,
WL_LATCH_SET |
WL_TIMEOUT | WL_POSTMASTER_DEATH,
naptime*1000L);
ResetLatch(MyLatch);
if (rc & WL_POSTMASTER_DEATH)
proc_exit(1);
}
proc_exit(1);
}
void
_PG_init(void)
{
BackgroundWorker worker;
if (!process_shared_preload_libraries_in_progress)
return;
worker.bgw_flags = BGWORKER_SHMEM_ACCESS |
BGWORKER_BACKEND_DATABASE_CONNECTION;
worker.bgw_start_time = BgWorkerStart_ConsistentState;
worker.bgw_restart_time = 1;
worker.bgw_main = worker_main;
worker.bgw_notify_pid = 0;
snprintf(worker.bgw_name, BGW_MAXLEN, "my_worker");
RegisterBackgroundWorker(&worker);
}
Most of this code is taken from "worker_spi.c".
Basically, it just initializes connection to the postgres database and
sleeps all the time.
If I comment out the 'BackgroundWorkerInitializeConnection("postgres",
NULL);' call, postgres starts without any problem.
What is very strange, because background worker itself is not doing anything...
And one more thing, if I add sleep(15) before calling
BackgroundWorkerInitializeConnection, postgres manages to start
successfully.
Is there a very strange race condition here?
Regards,
--
Alexander Kukushkin
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2018-08-22 14:44:40 | Re: BUG #15346: Replica fails to start after the crash |
Previous Message | David Steele | 2018-08-22 12:35:33 | Re: BUG #15335: Documentation is wrong about archive_command and existing files |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2018-08-22 14:36:00 | Re: JIT compiling with LLVM v12 |
Previous Message | Andres Freund | 2018-08-22 14:13:24 | Re: patch to allow disable of WAL recycling |