Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Luke Koops <luke(dot)koops(at)entrust(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Date: 2012-08-21 18:54:28
Message-ID: CA+TgmobQ0sLjzHdEQZr9QG7n2r-BKej+UTH7rcPPh09doMvUDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Aug 7, 2012 at 2:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> We just had a customer hit a very similar problem on 9.1.3, running on
>> Windows Server 2008 SP2. ...
>> The customer finds that they can reproduce this on a variety of
>> systems under heavy load.
>
>> Now, it looks to me like for this stack trace to happen,
>> PgstatCollectorMain() has got to call pgwin32_waitforsinglesocket (at
>> line 3002), and that function has to return true, so that got_data
>> gets set to true. Then PgstatCollectorMain() will call recv(), which
>> on Windows will really be pgwin32_recv, which will call
>> pgwin32_waitforsinglesocket, which must now hang. The fact that the
>> first pgwin32_waitforsinglesocket call returned true should mean that
>> the stats collector socket is ready for read, while the fact that the
>> second one did not return seems to imply that it's not ready for read,
>> close, or accept. So it almost looks like Windows can change its mind
>> about whether the socket is readable.
>
>> Or maybe we're telling it to change its mind. This sounds an awful
>> lot like something that could have been caused by the oversights fixed
>> in commit b85427f2276d02756b558c0024949305ea65aca5. Was there a
>> reason we didn't back-patch that?
>
> Sure: it was unproven that that fixed anything at all, much less that it
> was bug-free enough to be safe to backpatch. Neither of those things
> has changed since May. If you want you can try making up a 9.1 with
> those changes and giving it to this customer to see if it fixes their
> problems --- but without some field testing of the sort, I'm pretty
> hesitant to put it into back branches.

Well, we had the customer try out 9.2beta, and they were unable to
reproduce the issue there. Woo-hoo. Does that qualify as sufficient
evidence for back-patching this?

(BTW, I think commit 9b63e9869ffaa4d6d3e8bf45086a765d8f310f1c contains
a thinko in one of the comments: shouldn't "a crock of the first
water" be "a crock of the first order"?)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2012-08-21 19:17:10 Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Previous Message Freddie Burgess 2012-08-21 15:58:57 Database crash in Postgres 8.4.3