Re: Dangling Client Backend Process

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Dangling Client Backend Process
Date: 2015-10-30 14:46:35
Message-ID: 20151030144635.GA6064@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-10-30 09:48:33 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > Hmm. ProcessInterrupts() signals some FATAL errors while the
> > connection is idle, and rumor has it that that works: the client
> > doesn't immediately read the FATAL error, but the next time it sends a
> > query, it tries to read from the connection and sees the FATAL error
> > at that time. I wonder why that's not working here.
>
> A likely theory is that the kernel is reporting failure to libpq's
> send() because the other side of the connection is already gone.
> This would be timing-dependent of course.

Looking at a strace psql over unix socket is actually receiving the
error message:
recvfrom(3, "E\0\0\0lSFATAL\0C57P01\0Mterminating "..., 16384, 0, NULL, NULL) = 109
but psql does print:
server closed the connection unexpectedly

it happens to work over localhost:
FATAL: 57P01: terminating connection due to unexpected postmaster exit
LOCATION: secure_read, be-secure.c:170
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

the problem seems to be the loop eating all the remaining input:
void
pqHandleSendFailure(PGconn *conn)
{
/*
* Accept any available input data, ignoring errors. Note that if
* pqReadData decides the backend has closed the channel, it will close
* our side of the socket --- that's just what we want here.
*/
while (pqReadData(conn) > 0)
/* loop until no more data readable */ ;

after the first pqReadData() there's no remaining input and thus the
second call to pqReadData()'s pqsecure_read reads 0 and this is hit:
/*
* OK, we are getting a zero read even though select() says ready. This
* means the connection has been closed. Cope.
*/
definitelyEOF:
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext(
"server closed the connection unexpectedly\n"
"\tThis probably means the server terminated abnormally\n"
"\tbefore or while processing the request.\n"));

adding a parseInput(conn) into the loop yields the expected
FATAL: 57P01: terminating connection due to unexpected postmaster exit

Is there really any reason not to do that?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2015-10-30 14:46:59 Re: Move PinBuffer and UnpinBuffer to atomics
Previous Message Oleg Bartunov 2015-10-30 14:44:53 Re: Did the "Full-text search in PostgreSQL in milliseconds" patches land?