Re: Possible explanation for Win32 stats regression test

From: korry <korry(at)appx(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible explanation for Win32 stats regression test
Date: 2006-07-17 16:33:56
Message-ID: 1153154036.8500.12.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

> Ah-hah, I see it. pgwin32_select() uses WaitForMultipleObjectsEx() with
> an event for the socket read-ready plus an event for signal arrival.
> It returns EINTR if the return code from WaitForMultipleObjectsEx shows
> the signal-arrival event as fired. However, WaitForMultipleObjectsEx is
> defined to return the number of the *first* event in the list that is
> fired. This means that if the socket comes read-ready at the same time
> the SIGALRM arrives, pgwin32_select() will ignore the signal, and it'll
> be processed by the subsequent pgwin32_recv().
>
> Now I don't know anything about the Windows scheduler, but I suppose it
> gives processes time quantums like everybody else does. So "at the same
> time" really means "within the same scheduler clock tick", which is not
> so unlikely after all. In short, before the just-committed patch, the
> Windows stats collector would fail if a stats message arrived during the
> same clock tick that its SIGALRM timeout expired.
>
> I think this explains not only the intermittent stats regression
> failures, but the reports we've heard from Merlin and others about the
> stats collector being unstable under load on Windows. The heavier the
> load of stats messages, the more likely one is to arrive during the tick
> when the timeout expires.

There's a second problem in pgwin32_waitforsinglesocket() that may be
getting in your way.

Inside of pgwin32_waitforsingleselect(), we create a kernel
synchronization object (an Event) and associate that Event with the
socket. When the TCP/IP stack detects interesting traffic on the
socket, it signals the Event object (interesting in this case is READ,
WRITE, CLOSE, or ACCEPT, depending on the caller) and that wakes up the
call to WaitForMultipleObjectsEx().

That all works fine, unless you have two or more sockets in the backend
(the important part is that src/include/port/win32.h #define's select()
and other socket-related function - if you compile a piece of network
code that happens to #include port/win32.h, you'll get the pgwin32_xxx()
versions).

The problem is that, each time you go through
pgwin32_waitforsinglesocket(), you tie the *same* kernel object
(waitevent is static) to each socket. If you have more than one socket,
you'll tie each socket to the same kernel event. The kernel will signal
that Event whenever interesting traffic appears on *any* of the sockets.
The net effect is that, if you are waiting for activity on socket A, any
activity on socket B will also awaken WaitForMultipleObjects(). If you
then try to read from socket A, you'll get an "operation would block
error" because nothing happened on socket A.

The fix is pretty simple - just call WSAEventSelect( s, waitevent, 0 )
after WaitForMultipleObjectsEx() returns. That disassociates the socket
from the Event (it will get re-associated the next time
pgwin32_waitforsingleselect() is called.

I ran into this problem working on the PL/pgSQL debugger and I haven't
gotten around to posting a patch yet, sorry.

-- Korry (korryd(at)enterprisedb(dot)com)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2006-07-17 17:11:57 Re: plPHP and plRuby
Previous Message Josh Berkus 2006-07-17 16:25:49 Re: Continuous dataflow streaming

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2006-07-17 19:43:14 Re: src/tools/pginclude considered harmful (was Re:
Previous Message Sven Suursoho 2006-07-17 12:17:49 Re: plpython improvements