Re: libpq copy error handling busted

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: libpq copy error handling busted
Date: 2020-06-04 16:29:06
Message-ID: 1250623.1591288146@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> * As for control-C not getting out of it: there is
> if (CancelRequested)
> break;
> in pgbench's loop, but this does nothing in this scenario because
> fe-utils/cancel.c only sets that flag when it successfully sends a
> Cancel ... which it certainly cannot if the postmaster is gone.
> I suspect that this may be relatively recent breakage. It doesn't look
> too well thought out, in any case. The places that are testing this
> flag look like they'd rather not be bothered with the fine point of
> whether the cancel request actually went anywhere.

On closer inspection, it seems that scripts_parallel.c does have a
dependency on the cancel request having been sent, because it insists
on collecting a query result off the active connection after detecting
CancelRequested. This seems dangerously overoptimistic to me; it will
lock up if for any reason the server doesn't honor the cancel request.
It's also pointless, because all the calling apps are just going to close
their connections and exit(1) afterwards, so there's no use in trying to
resynchronize the connection state. (Plus, disconnectDatabase will
issue cancels on any busy connections; which would be necessary anyway
in a parallel operation, since cancel.c could only have signaled one of
them.) So the attached patch just removes the useless consumeQueryResult
call, and then simplifies select_loop's API a bit.

With that change, I don't see any place that wants the existing definition
of CancelRequested rather than the simpler meaning of "SIGINT was
received", so I just changed it to mean that. We could certainly also
have a variable tracking whether a cancel request was sent, but I see
no point in one right now.

It's easiest to test this *without* the other patch -- just run the
pgbench scenario Andres demonstrated, and see whether control-C gets
pgbench to quit cleanly.

regards, tom lane

Attachment Content-Type Size
fix-cancelrequested.patch text/x-diff 4.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-06-04 16:57:30 Re: posgres 12 bug (partitioned table)
Previous Message Avinash Kumar 2020-06-04 15:23:03 Re: Just for fun: Postgres 20?