Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

From: Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jacob Champion <pchampion(at)vmware(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Date: 2022-03-28 09:28:19
Message-ID: HE1PR8303MB00732AF8BEF2E45DACFE6633F71D9@HE1PR8303MB0073.EURPRD83.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for all the feedback everyone. I'll try to send a new patch
later this week that includes user facing docs and a simplified API.
For now a few responses:

> Yeah. We need to choose a name for these new function(s) that is
> sufficiently different from "PQcancel" that people won't expect them
> to behave exactly the same as that does. I lack any good ideas about
> that, how about you?

So I guess the names I proposed were not great, since everyone seems to be falling over them.
But I'd like to make my intention clear with the current naming. After this patch there would be
four different APIs for starting a cancelation:
1. PQrequestCancel: deprecated+old, not signal-safe function for requesting query cancellation, only uses a specific set of connection options
2. PQcancel: Cancel queries in a signal safe way, to be signal-safe it only uses a limited set of connection options
3. PQcancelConnect: Cancel queries in a non-signal safe way that uses all connection options
4. PQcancelConnectStart: Cancel queries in a non-signal safe and non-blocking way that uses all connection options

So the idea was that you should not look at PQcancelConnectStart as the non-blocking
version of PQcancel, but as the non-blocking version of PQcancelConnect. I'll try to
think of some different names too, but IMHO these names could be acceptable
when their differences are addressed sufficiently in the documentation.

One other approach to naming that comes to mind now is repurposing PQrequestCancel:
1. PQrequestCancel: Cancel queries in a non-signal safe way that uses all connection options
2. PQrequestCancelStart: Cancel queries in a non-signal safe and non-blocking way that uses all connection options
3. PQcancel: Cancel queries in a signal safe way, to be signal-safe it only uses a limited set of connection options

> I think it's probably a
> bad sign that this function is tinkering with logic in this sort of
> low-level function anyway. pqReadData() is a really general function
> that manages to work with non-blocking I/O already, so why does
> non-blocking query cancellation need to change its return values, or
> whether or not it drops data in certain cases?

The reason for this low level change is that the cancellation part of the
Postgres protocol is following a different, much more simplistic design
than all the other parts. The client does not expect a response message back
from the server after sending the cancellation request. The expectation
is that the server signals completion by closing the connection, i.e. sending EOF.
For all other parts of the protocol, connection termination should be initiated
client side by sending a Terminate message. So the server closing (sending
EOF) is always unexpected and is thus currently considered an error by pqReadData.

But since this is not the case for the cancellation protocol, the result is
changed to -2 in case of EOF to make it possible to distinguish between
an EOF and an actual error.

> And it updates that function to return -2 when the is
> ECONNRESET, which seems to fly in the face of the comment's idea that
> this is the "clean connection closure" case.

The diff sadly does not include the very relevant comment right above these
lines. Pasting the whole case statement here to clear up this confusion:

case SSL_ERROR_ZERO_RETURN:

/*
* Per OpenSSL documentation, this error code is only returned for
* a clean connection closure, so we should not report it as a
* server crash.
*/
appendPQExpBufferStr(&conn->errorMessage,
libpq_gettext("SSL connection has been closed unexpectedly\n"));
result_errno = ECONNRESET;
n = -2;
break;

> For example, it updates the
> comment for pqsecure_read() to say "Returns -1 in case of failures,
> except in the case of clean connection closure then it returns -2."
> But that function calls any of three different implementation
> functions depending on the situation and the patch only updates one of
> them.

That comment is indeed not describing what is happening correctly and I'll
try to make it clearer. The main reason for it being incorrect is coming from
the fact that receiving EOFs is handled in different places based on the
encryption method:

1. Unencrypted TCP: EOF is not returned as an error by pqsecure_read, but detected by pqReadData (see comments related to definitelyEOF)
2. OpenSSL: EOF is returned as an error by pqsecure_read (see copied case statement above)
3. GSS: When writing the patch I was not sure how EOF handling worked here, but given that the tests passed for Jacob on GSS, I'm guessing it works the same as unencrypted TCP.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-03-28 09:36:46 Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)
Previous Message Daniel Gustafsson 2022-03-28 09:17:25 Re: [PATCH] Accept IP addresses in server certificate SANs