Failed assertion on standby while shutdown

From: Maxim Orlov <m(dot)orlov(at)postgrespro(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Failed assertion on standby while shutdown
Date: 2021-03-19 17:25:47
Message-ID: ad4ce692cc1d89a093b471ab1d969b0b@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, haсkers!

Recently, I was doing some experiments with primary/standby instances
interaction. In certain conditions I’ve got and was able to reproduce
crash on failed assertion.

The scenario is the following:
1. start primary server
2. start standby server by pg_basebackup -P -R -X stream -c fast -p5432
-D data
3. apply some load to the primary server by pgbench -p5432 -i -s 150
postgres
4. kill primary server (with kill -9) and keep it down
5. stop standby server by pg_ctl
6. run standby server

Then any standby server termination will result in a failed assertion.

The log with a backtrace is following:

2021-03-19 18:54:25.352 MSK [3508443] LOG: received fast shutdown
request
2021-03-19 18:54:25.379 MSK [3508443] LOG: aborting any active
transactions
TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))", File:
"/home/ziva/projects/pgpro/build-secondary/../postgrespro/src/backend/storage/lmgr/proc.c",
Line: 592, PID: 3508452)
postgres: walreceiver (ExceptionalCondition+0xd0)[0x555555d0526f]
postgres: walreceiver (InitAuxiliaryProcess+0x31c)[0x555555b43e31]
postgres: walreceiver (AuxiliaryProcessMain+0x54f)[0x55555574ae32]
postgres: walreceiver (+0x530bff)[0x555555a84bff]
postgres: walreceiver (+0x531044)[0x555555a85044]
postgres: walreceiver (+0x530959)[0x555555a84959]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7ffff7a303c0]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x1a)[0x7ffff72a40da]
postgres: walreceiver (+0x52bea4)[0x555555a7fea4]
postgres: walreceiver (PostmasterMain+0x129f)[0x555555a7f7c1]
postgres: walreceiver (+0x41ff1f)[0x555555973f1f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ffff71b30b3]
postgres: walreceiver (_start+0x2e)[0x55555561abfe]

After a brief investigation I found out that I can get this assert with
100% probability if I insert a sleep for about 5 sec into
InitAuxiliaryProcess(void) in src/backend/storage/lmgr/proc.c:

diff --git a/src/backend/storage/lmgr/proc.c
b/src/backend/storage/lmgr/proc.c
index 897045ee272..b5f365f426d 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -525,7 +525,7 @@ InitAuxiliaryProcess(void)

if (MyProc != NULL)
elog(ERROR, "you already exist");
-
+ pg_usleep(5000000L);
/*
* We use the ProcStructLock to protect assignment and releasing
of
* AuxiliaryProcs entries.

Maybe, this kinda behaviour would appear if a computer hosting instances
is under significant side load, which cause delay to start db-instances
under a heavy load.

Configuration for a primary server is default with "wal_level = logical"

Configuration for a standby server is default with "wal_level = logical"
and "primary_conninfo = 'port=5432'"

I'm puzzled with this behavor. I'm pretty sure it is not what should be.
Any ideas how this can be fixed?

---
Best regards,
Maxim Orlov.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-03-19 17:27:21 Re: cleanup temporary files after crash
Previous Message Robert Haas 2021-03-19 17:24:39 Re: [HACKERS] Custom compression methods