Re: Failed assertion on standby while shutdown

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Maxim Orlov <m(dot)orlov(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failed assertion on standby while shutdown
Date: 2021-03-22 13:40:54
Message-ID: 9c7b69c2-fb17-1780-e475-7dd2c6bbe18b@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/03/20 2:25, Maxim Orlov wrote:
> Hi, haсkers!
>
> Recently, I was doing some experiments with primary/standby instances interaction. In certain conditions I’ve got and was able to reproduce crash on failed assertion.
>
> The scenario is the following:
> 1. start primary server
> 2. start standby server by pg_basebackup -P -R -X stream -c fast -p5432 -D data
> 3. apply some load to the primary server by pgbench -p5432 -i -s 150 postgres
> 4. kill primary server (with kill -9) and keep it down
> 5. stop standby server by pg_ctl
> 6. run standby server
>
> Then any standby server termination will result in a failed assertion.
>
> The log with a backtrace is following:
>
> 2021-03-19 18:54:25.352 MSK [3508443] LOG:  received fast shutdown request
> 2021-03-19 18:54:25.379 MSK [3508443] LOG:  aborting any active transactions
> TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))", File: "/home/ziva/projects/pgpro/build-secondary/../postgrespro/src/backend/storage/lmgr/proc.c", Line: 592, PID: 3508452)
> postgres: walreceiver (ExceptionalCondition+0xd0)[0x555555d0526f]
> postgres: walreceiver (InitAuxiliaryProcess+0x31c)[0x555555b43e31]
> postgres: walreceiver (AuxiliaryProcessMain+0x54f)[0x55555574ae32]
> postgres: walreceiver (+0x530bff)[0x555555a84bff]
> postgres: walreceiver (+0x531044)[0x555555a85044]
> postgres: walreceiver (+0x530959)[0x555555a84959]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7ffff7a303c0]
> /lib/x86_64-linux-gnu/libc.so.6(__select+0x1a)[0x7ffff72a40da]
> postgres: walreceiver (+0x52bea4)[0x555555a7fea4]
> postgres: walreceiver (PostmasterMain+0x129f)[0x555555a7f7c1]
> postgres: walreceiver (+0x41ff1f)[0x555555973f1f]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ffff71b30b3]
> postgres: walreceiver (_start+0x2e)[0x55555561abfe]
>
> After a brief investigation I found out that I can get this assert with 100% probability if I insert a sleep for about 5 sec into InitAuxiliaryProcess(void) in src/backend/storage/lmgr/proc.c:
>
> diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
> index 897045ee272..b5f365f426d 100644
> --- a/src/backend/storage/lmgr/proc.c
> +++ b/src/backend/storage/lmgr/proc.c
> @@ -525,7 +525,7 @@ InitAuxiliaryProcess(void)
>
>         if (MyProc != NULL)
>                 elog(ERROR, "you already exist");
> -
> +       pg_usleep(5000000L);
>         /*
>          * We use the ProcStructLock to protect assignment and releasing of
>          * AuxiliaryProcs entries.

Thanks for the report! I could reproduce this issue by adding that sleep
into InitAuxiliaryProcess().

> Maybe, this kinda behaviour would appear if a computer hosting instances is under significant side load, which cause delay to start db-instances under a heavy load.
>
> Configuration for a primary server is default with "wal_level = logical"
>
> Configuration for a standby server is default with "wal_level = logical" and "primary_conninfo = 'port=5432'"
>
> I'm puzzled with this behavor. I'm pretty sure it is not what should be. Any ideas how this can be fixed?

ISTM that the cause of this issue is that the startup process exits
without releasing the locks that it was holding when shutdown is
requested. To address this issue, IMO the startup process should
call ShutdownRecoveryTransactionEnvironment() at its exit.
Attached is the POC patch that changes the startup process that way.

I've not tested the patch enough yet..

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
fix_assertion_failure_walreceiver.patch text/plain 1.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2021-03-22 13:43:53 Re: pgbench - add pseudo-random permutation function
Previous Message Masahiko Sawada 2021-03-22 13:39:41 Re: New IndexAM API controlling index vacuum strategies