Re: libpq copy error handling busted

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: libpq copy error handling busted
Date: 2020-06-04 01:35:53
Message-ID: 1187243.1591234553@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> * pqSendSome() is responsible not only for pushing out data, but for
> calling pqReadData in any situation where it can't get rid of the data
> promptly. 1f39a1c06 overlooked that requirement, and the upshot is
> that we don't necessarily notice that the connection is broken (it's
> pqReadData's job to detect that). Putting a pqReadData call into
> the early-exit path helps, but doesn't fix things completely.

Ah, it's better if I put the pqReadData call into *both* the paths
where 1f39a1c06 made pqSendSome give up. The attached patch seems
to fix the issue for the "pgbench -i" scenario, with either fast-
or immediate-mode server stop. I tried it with and without SSL too,
just to see. Still, it's not clear to me whether this might worsen
any of the situations we discussed in the lead-up to 1f39a1c06 [1].
Thomas, are you in a position to redo any of that testing?

> * The more longstanding problem is that the PQputCopyData code path
> doesn't have any mechanism for consuming an 'E' (error) message
> once pqReadData has collected it.

At least with pgbench's approach (die immediately on PQputline failure)
this isn't very relevant once we apply the attached. Perhaps we should
revisit this behavior anyway, but I'd be afraid to back-patch a change
of that nature.

> * As for control-C not getting out of it: there is
> if (CancelRequested)
> break;
> in pgbench's loop, but this does nothing in this scenario because
> fe-utils/cancel.c only sets that flag when it successfully sends a
> Cancel ... which it certainly cannot if the postmaster is gone.

I'll send a patch for this later.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAEepm%3D2n6Nv%2B5tFfe8YnkUm1fXgvxR0Mm1FoD%2BQKG-vLNGLyKg%40mail.gmail.com

Attachment Content-Type Size
fix-pqSendSome-error-behavior.patch text/x-diff 1.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-06-04 01:43:56 Re: elog(DEBUG2 in SpinLocked section.
Previous Message Andres Freund 2020-06-04 01:33:11 Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762