From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: "pgstat wait timeout" just got a lot more common on Windows |
Date: | 2012-05-10 15:27:19 |
Message-ID: | CABUevEwoivuFOkyWPBP=rWKywd1OxC7aROG1xfyULv5hqQnXkg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On May 10, 2012 4:59 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > Last night I changed the stats collector process to use
> > WaitLatchOrSocket instead of a periodic forced wakeup to see whether
> > the postmaster has died. This morning I observe that several Windows
> > buildfarm members are showing regression test failures caused by
> > unexpected "pgstat wait timeout" warnings. Everybody else is fine.
>
> > This suggests that there is something broken in the Windows
> > implementation of WaitLatchOrSocket. I wonder whether it also
> > tells us something we did not know about the underlying cause of
> > those messages. Not sure what though. Ideas? Can anyone who
> > knows Windows take another look at WaitLatchOrSocket?
>
> Anybody have any clues about that? If not, I think I'll have to revert
> the pgstat changes for beta1, which isn't really forward progress.
Haven't had time to look at the code itself, and won't before wrap time.
Sorry.
> I spent some time staring at the Windows WaitLatchOrSocket code myself.
> The only thing I could find that seemed wrong is that in the event
> array, we list the latch's event before pgwin32_signal_event. The
> Microsoft documentation I looked at says that if more than one event
> is ready, WaitforMultipleObjects reports the first such array member.
> This means that if the latch is already set when control gets here,
> signal handlers will not be serviced.
Yeah, that does seem wrong.
> That doesn't match what would
> happen on a Unix machine, so it seems like at least a violation of the
> POLA. Hence I think we oughta swap the order of those two array
> elements. (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
> pgwin32_select.) I do not however
Maybe we need a loop that checks for all events?
> see a way that that would explain the
> pgstat failures, because the stats collector's latch really shouldn't
> ever get set during normal regression test runs.
So could there be something wrong in the other end, meaning the latch
*does* get set?
/Magnus
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2012-05-10 15:31:15 | Re: Draft release notes complete |
Previous Message | Robert Haas | 2012-05-10 15:26:14 | Re: Draft release notes complete |