Re: Hang in pldebugger after git commit : 98a64d0

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Hang in pldebugger after git commit : 98a64d0
Date: 2016-12-15 04:08:27
Message-ID: CA+TgmoY72e1yYD+YnuWRaN7pTh4FCgXMPqkpNAWQp7yxek9JUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 13, 2016 at 11:19 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
>> OK, I think that I have spotted an issue after additional read of the
>> code. When a WSA event is used for read/write activity on a socket,
>> the same WSA event gets reused again and again. That's fine for
>> performance reasons
>
> It's actually also required to not loose events,
> i.e. correctness. Windows events are edge, not level triggered.

Windows might not lose the event, but PostgreSQL can certainly lose
the event. In the Windows implementation of WaitEventSetBlock, we'll
ignore WL_SOCKET_READABLE if WL_LATCH_SET also fires. Now, since the
event is edge-triggered[1], it never fires again and we hang. Oops.
Even if we rewrote WaitEventSetBlock to always process all of the
events, there's nothing in the latch API that requires the caller to
read from the socket the first time WL_SOCKET_READABLE shows up. The
caller is perfectly entitled to write:

if (rv & WL_LATCH_SET)
// do stuff
else if (rv & WL_SOCKET_READABLE)
// do other stuff

I suppose we could make a coding rule that the caller must not rely on
WL_SOCKET_READABLE being level-triggered, but I think that's likely to
lead to a stream of bugs that only show up on Windows, because most
people are used to the Linux semantics and will naturally write code
like the above which isn't robust in the face of an edge-triggered
event -- just as you yourself did in WaitEventSetBlock.

> The
> previous implementation wasn't correct. So just resetting things ain't
> ok.

How was it incorrect? The previous implementation *worked* and the
new one doesn't, at least in the case complained of here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-12-15 04:20:21 Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Previous Message Amit Kapila 2016-12-15 03:32:02 Re: Hang in pldebugger after git commit : 98a64d0