Re: [BUGS] Crash observed during the start of the Postgres process

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "K S, Sandhya (Nokia - IN/Bangalore)" <sandhya(dot)k_s(at)nokia(dot)com>
Cc: "'Merlin Moncure'" <mmoncure(at)gmail(dot)com>, "'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>, "'pgsql-bugs(at)postgresql(dot)org'" <pgsql-bugs(at)postgresql(dot)org>, "Itnal, Prakash (Nokia - IN/Bangalore)" <prakash(dot)itnal(at)nokia(dot)com>, "T, Rasna (Nokia - IN/Bangalore)" <rasna(dot)t(at)nokia(dot)com>
Subject: Re: [BUGS] Crash observed during the start of the Postgres process
Date: 2017-05-12 16:38:12
Message-ID: 23381.1494607092@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

"K S, Sandhya (Nokia - IN/Bangalore)" <sandhya(dot)k_s(at)nokia(dot)com> writes:
> I have filtered the logs based on PID (19825) to see if this helps to
> debug the issue further.

Is this really a stock Postgres build?

The proximate cause of the PANIC is that the startup process is seeing
other processes active even though it hasn't reachedConsistency. This
is bad on any number of levels, quite aside from that particular PANIC,
because those other processes are presumably seeing non-consistent
database state. Looking elsewhere in the log, we see that indeed there
seem to be several backend processes happily executing commands.
For instance, here's the trace of one of them starting up:

[19810-58f473ff.4d62-187] 2017-04-17 07:51:28.783 GMT < > DEBUG: 00000: forked new backend, pid=19850 socket=10
[19810-58f473ff.4d62-188] 2017-04-17 07:51:28.783 GMT < > LOCATION: BackendStartup, postmaster.c:3884
[19850-58f47400.4d8a-1] 2017-04-17 07:51:28.783 GMT < > LOG: 57P03: the database system is starting up
[19850-58f47400.4d8a-2] 2017-04-17 07:51:28.783 GMT < > LOCATION: ProcessStartupPacket, postmaster.c:2143
[19850-58f47400.4d8a-3] 2017-04-17 07:51:28.784 GMT < authentication> DEBUG: 00000: postgres child[19850]: starting with (

Now, that LOG message proves that this backend has observed that the
database is not ready to allow connections. So why did it only emit the
message as LOG and keep going? The code for this in 9.3 looks like

/*
* If we're going to reject the connection due to database state, say so
* now instead of wasting cycles on an authentication exchange. (This also
* allows a pg_ping utility to be written.)
*/
switch (port->canAcceptConnections)
{
case CAC_STARTUP:
ereport(FATAL,
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
errmsg("the database system is starting up")));
break;
...

I can't draw any other conclusion but that you've hacked something
to make FATAL act like LOG. Which is a fatal mistake. Errors that
are marked FATAL are generally ones where allowing the process to
keep going is not an acceptable outcome.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2017-05-12 16:54:36 Re: Crash observed during the start of the Postgres process
Previous Message Vitaliy Gomenyuk 2017-05-12 15:55:12 Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-05-12 16:40:33 Re: snapbuild woes
Previous Message Dmitriy Sarafannikov 2017-05-12 16:12:49 Re: [PROPOSAL] Use SnapshotAny in get_actual_variable_range