Re: "pgstat wait timeout" just got a lot more common on Windows

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: "pgstat wait timeout" just got a lot more common on Windows
Date: 2012-05-10 14:58:26
Message-ID: 426.1336661906@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Last night I changed the stats collector process to use
> WaitLatchOrSocket instead of a periodic forced wakeup to see whether
> the postmaster has died. This morning I observe that several Windows
> buildfarm members are showing regression test failures caused by
> unexpected "pgstat wait timeout" warnings. Everybody else is fine.

> This suggests that there is something broken in the Windows
> implementation of WaitLatchOrSocket. I wonder whether it also
> tells us something we did not know about the underlying cause of
> those messages. Not sure what though. Ideas? Can anyone who
> knows Windows take another look at WaitLatchOrSocket?

Anybody have any clues about that? If not, I think I'll have to revert
the pgstat changes for beta1, which isn't really forward progress.

I spent some time staring at the Windows WaitLatchOrSocket code myself.
The only thing I could find that seemed wrong is that in the event
array, we list the latch's event before pgwin32_signal_event. The
Microsoft documentation I looked at says that if more than one event
is ready, WaitforMultipleObjects reports the first such array member.
This means that if the latch is already set when control gets here,
signal handlers will not be serviced. That doesn't match what would
happen on a Unix machine, so it seems like at least a violation of the
POLA. Hence I think we oughta swap the order of those two array
elements. (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
pgwin32_select.) I do not however see a way that that would explain the
pgstat failures, because the stats collector's latch really shouldn't
ever get set during normal regression test runs.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-05-10 15:04:47 Re: Draft release notes complete
Previous Message Magnus Hagander 2012-05-10 14:51:14 Re: incorrect handling of the timeout in pg_receivexlog