Re: Windows: Wrong error message at connection termination

From: Lars Kanis <lars(at)greiz-reinsdorf(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Windows: Wrong error message at connection termination
Date: 2021-11-21 19:19:29
Message-ID: 6dad5499-ced4-ac82-4f5d-fb6311378176@greiz-reinsdorf.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Am 18.11.21 um 03:04 schrieb Tom Lane:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
>> I realise now that the experiments we did a while back to try to
>> understand this across a few different operating systems[2] had missed
>> this subtlety, because that Python script had an explicit close()
>> call, whereas PostgreSQL exits. It still revealed that the client
>> isn't allowed to read any data after its write failed, which is a
>> known source of error messages being eaten.
> Yeah. After re-reading that thread, I'm a bit confused about how
> to square the results we got then with Lars' report. The Windows
> documentation he pointed to does claim that the default behavior if you
> issue closesocket() is to do a "graceful close in the background", which
> one would think means allowing sent data to be received. That's not what
> we saw. It's possible that we would get different results if we re-tested
> with a scenario where the client doesn't attempt to send data after the
> server-side close; but I'm not sure how much it's worth to improve that
> case if the other case still fails hard.

Form my experimentation the Winsock implementation has the two issues
which I explained. First it drops all received but not yet retrieved
data as soon as it receives a RST packet. And secondly it always sends a
RST packet on every socket, that wasn't send-closed at process
termination, regardless if there is any pending data.

Sending data to a socket, that was already closed from the other side is
only one way to trigger a RST packet, but closing a socket with
l_linger=0 is another way and process termination is the third. They all
can lead to data loss on the receiver side, presumably because of the
RST flag.

An alternative to closesocket() is shutdown(sock, SD_SEND). It doesn't
free the socket resource, but leads to a graceful shutdown. However the
FIN packet is send when the shutdown() or closesocket() function is
called and that's still short before the process terminates. I did some
more testing with different linger options, but it didn't change the
behavior substantial. So I didn't find any way to close the socket with
a FIN packet at the point in time of the process termination.

The other way around would be to make sure on the client side, that the
last message is retrieved before the RST packet arrives, so that no data
is lost. This works mostly well through the sync API of libpq, but with
the async API the trigger for data reception is outside of the scope of
libpq, so that there's no way to ensure recv() is called quick enough,
after the data was received but before RST arrives. On a local
client+server combination there is only a gap of 0.5 milliseconds or so.
I also didn't find a way to retrieve the enqueued data after RST
arrived. Maybe there's a nasty hack to retrieve the data afterwards, but
I didn't dig into assembly code and memory layout of Winsock internals.

> In any case, our previous
> results definitely show that issuing an explicit close() is no panacea.
I don't fully understand the issue with closing the socket before
process termination. Sure, it can be a valuable information that the
corresponding backend process has definitely terminated. At least in the
context of regression testing or so. But I think that loosing messages
from the backend is way more critical than a non-sync process
termination. Do I miss something?

--

Regards,
Lars Kanis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-11-21 20:24:21 Re: Windows: Wrong error message at connection termination
Previous Message Tom Lane 2021-11-21 19:14:36 Re: Improving psql's \password command