Windows: Wrong error message at connection termination

From: Lars Kanis <lars(at)greiz-reinsdorf(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Windows: Wrong error message at connection termination
Date: 2021-11-17 21:13:33
Message-ID: 90b34057-4176-7bb0-0dbb-9822a5f6425b@greiz-reinsdorf.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear hackers,

I lately had a hard time to find the root cause for some wired behavior
with the async API of libpq when running client and server on Windows.
When the connection aborts with an error - most notably with an error at
the connection setup - it sometimes fails with a wrong error message:

Instead of:

    connection to server at "::1", port 5433 failed: FATAL:  role "a"
does not exist

it fails with:

    connection to server at "::1", port 5433 failed: server closed the
connection unexpectedly

I found out, that the recv() function of the Winsock API has some wired
behavior. If the connection receives a TCP RST flag, recv() immediately
returns -1, regardless if all previous data has been retrieved. So when
the connection is closed hard, the behavior is timing dependent on the
client side. It may drop the last packet or it delivers it to libpq, if
libpq calls recv() quick enough.

This behavior is described at closesocket() here:
https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-closesocket

> This is called a hard or abortive close, because the socket's virtual
> circuit is reset immediately, and any unsent data is lost. On Windows,
> any *recv* call on the remote side of the circuit will fail with
> WSAECONNRESET
> <https://docs.microsoft.com/en-us/windows/desktop/WinSock/windows-sockets-error-codes-2>.

Unfortunately each connection is closed hard by a Windows PostgreSQL
server with TCP flag RST. That in turn is another Winsock API behavior,
that is that every socket, that wasn't closed by the application is
closed hard with the RST flag at process termination. I didn't find any
official documentation about this behavior.

Explicit closing the socket before process termination leads to a
graceful close even on Windows. That is done by the attached patch. I
think delivering the correct error message to the user is much more
important that closing the process in sync with the socket.

Some background: I'm the maintainer of ruby-pg, the PostgreSQL client
library for ruby. The next version of ruby-pg will switch to the async
API for connection setup. Using this API changes the timing of socket
operations and therefore often leads to the above wrong message.
Previous versions made use of the sync API, which usually doesn't suffer
from this issue. The original issue is here:
https://github.com/ged/ruby-pg/issues/404

--

Kind Regards
Lars Kanis

Attachment Content-Type Size
0001-Windows-Gracefully-close-the-socket-on-process-exit.patch text/x-patch 1.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-11-17 21:24:54 Re: CREATE PUBLICATION should "See Also" CREATE SUBSCRIPTION
Previous Message Jeff Davis 2021-11-17 21:10:20 Re: Non-superuser subscription owners