Re: libpq copy error handling busted

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: libpq copy error handling busted
Date: 2020-06-03 22:41:28
Message-ID: 1153694.1591224088@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> When libpq is used to COPY data to the server, it doesn't properly
> handle errors.
> This is partially an old problem, and partially got recently
> worse. Before the below commit we detected terminated connections, but
> we didn't handle copy failing.

Yeah. After poking at that for a little bit, there are at least three
problems:

* pqSendSome() is responsible not only for pushing out data, but for
calling pqReadData in any situation where it can't get rid of the data
promptly. 1f39a1c06 overlooked that requirement, and the upshot is
that we don't necessarily notice that the connection is broken (it's
pqReadData's job to detect that). Putting a pqReadData call into
the early-exit path helps, but doesn't fix things completely.

* The more longstanding problem is that the PQputCopyData code path
doesn't have any mechanism for consuming an 'E' (error) message
once pqReadData has collected it. AFAICS that's ancient. (It does
not affect the behavior of this case if you use an immediate-mode
shutdown, because then the backend never issues an 'E' message;
but it does matter in a fast-mode shutdown.) I think that the
idea was to let the client dump all its copy data and then report
the error message when PQendCopy is called, but as you say, that's
none too friendly when there's gigabytes of data involved. I'm
not sure we can do anything about this without effectively changing
the client API for copy errors, though.

* As for control-C not getting out of it: there is

if (CancelRequested)
break;

in pgbench's loop, but this does nothing in this scenario because
fe-utils/cancel.c only sets that flag when it successfully sends a
Cancel ... which it certainly cannot if the postmaster is gone.
I suspect that this may be relatively recent breakage. It doesn't look
too well thought out, in any case. The places that are testing this
flag look like they'd rather not be bothered with the fine point of
whether the cancel request actually went anywhere. (And aside from this
issue, I see no mechanism for that flag to become unset once it's set.
Current users of cancel.c probably don't care, but we'd have noticed if
we tried to make psql use it.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Soumyadeep Chakraborty 2020-06-03 23:21:57 Re: Parallel Seq Scan vs kernel read ahead
Previous Message Alvaro Herrera 2020-06-03 22:27:12 Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762