return value from pq_putmessage() is widely ignored

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: return value from pq_putmessage() is widely ignored
Date: 2020-04-17 19:49:10
Message-ID: CA+TgmoYSrCnCFhQafSnD_EhdCioTgy+AsUX-ekusVnWJyCGVbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

pq_putmessage() is a macro which calls a function that is normally
socket_putmessage(), which returns either 0 on success or EOF in the
case of failure. Most callers ignore the return value, sometimes with
an explicit cast to void, and other times without such a cast. As far
as I can see, the only case where we actually do anything significant
with the return value is in basebackup.c, where two of the calls to
pq_putmessage() do this:

if (pq_putmessage('d', buf, cnt))
ereport(ERROR,
(errmsg("base
backup could not send data, aborting backup")));

The other six calls in that file do not have a similar protection, and
there does not seem to be any explanation of why those two places are
special as compared either with other places in the same file or with
other parts of PostgreSQL, so I'm a little bit confused as to what's
going on here. Why do we return 0 or EOF only to ignore it, instead of
throwing ERROR like we do in most places?

One problem is that we might get into error recursion trouble: we
don't want to try to send an ErrorResponse, fail, and then respond by
generating another ErrorResponse, which will again fail, leading to
blowing out the error stack. It's also worth considering that the
error might be occurring halfway through sending some other protocol
message, in which case we can't start a new protocol message without
finishing the previous one. But the point is that in a case like this,
we've lost the client connection anyway. It seems like what we ought
to be doing is what ProcessInterrupts does in this situation:

/* don't send to client, we already know the
connection to be dead. */
whereToSendOutput = DestNone;
ereport(FATAL,
(errcode(ERRCODE_CONNECTION_FAILURE),
errmsg("connection to client lost")));

Why don't we just make internal_flush() do that directly in the event
of a failure, instead of setting flags that make it happen later and
then returning an error indicator? Can it be unsafe to throw an error
here? Do we have places that are calling pq_putmessage() inside
critical sections or something?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-04-17 19:58:04 Re: Poll: are people okay with function/operator table redesign?
Previous Message Alvaro Herrera 2020-04-17 19:26:46 Re: Should we add xid_current() or a int8->xid cast?