Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Luke Koops <luke(dot)koops(at)entrust(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Date: 2012-08-07 16:29:42
Message-ID: CA+TgmoZheGK5AvR8Nw0WPwwxzvfpzFENdCbP_2ennJSBnraEnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jul 31, 2009 at 10:59 AM, Luke Koops <luke(dot)koops(at)entrust(dot)com> wrote:
> -- postgres.exe!mainCRTStartup --
> ntoskrnl.exe!KiSwapContext+0x26
> ntoskrnl.exe!KiSwapThread+0x2e5
> ntoskrnl.exe!KeWaitForSingleObject+0x346
> ntoskrnl.exe!KiSuspendThread+0x18
> ntoskrnl.exe!KiDeliverApc+0x117
> ntoskrnl.exe!KiSwapThread+0x300
> ntoskrnl.exe!KeWaitForMultipleObjects+0x3d7
> ntoskrnl.exe!ObpWaitForMultipleObjects+0x202
> ntoskrnl.exe!NtWaitForMultipleObjects+0xe9
> ntoskrnl.exe!KiFastCallEntry+0xfc
> ntdll.dll!KiFastSystemCallRet
> ntdll.dll!NtWaitForMultipleObjects+0xc
> kernel32.dll!WaitForMultipleObjectsEx+0x11a
> postgres.exe!pgwin32_waitforsinglesocket+0x1ed
> postgres.exe!pgwin32_recv+0x90
> postgres.exe!PgstatCollectorMain+0x17f
> postgres.exe!SubPostmasterMain+0x33a
> postgres.exe!main+0x168
> postgres.exe!__tmainCRTStartup+0x10f
> kernel32.dll!BaseProcessStart+0x23

We just had a customer hit a very similar problem on 9.1.3, running on
Windows Server 2008 SP2. They were able to extract the following
stack trace:

ntoskrnl.exe!KiSwapContext+0x7a
ntoskrnl.exe!KiCommitThreadWait+0x1d2
ntoskrnl.exe!KeWaitForMultipleObjects+0x271
ntoskrnl.exe!ObpWaitForMultipleObjects+0x294
ntoskrnl.exe!NtWaitForMultipleObjects+0xe5
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!ZwWaitForMultipleObjects+0xa
KERNELBASE.dll!WaitForMultipleObjectsEx+0xe8
kernel32.dll!WaitForMultipleObjectsExImplementation+0xb3
postgres.exe!pgwin32_waitforsinglesocket+0x26d
postgres.exe!pgwin32_recv+0xf0
postgres.exe!PgstatCollectorMain+0x1cc
postgres.exe!SubPostmasterMain+0x4c2
postgres.exe!main+0x1d0
postgres.exe!__tmainCRTStartup+0x11a
kernel32.dll!BaseThreadInitThunk+0xd
ntdll.dll!RtlUserThreadStart+0x1d

The customer finds that they can reproduce this on a variety of
systems under heavy load. However, removing the load doesn't fix the
problem; the system continues to spew pgstat wait timeout messages
into the logs. Autovacuum fails to DTRT due to lack of current stats
and things go downhill rapidly from there. Terminating the stats
collector process resolves the issue; the postmaster starts a new one
within 60 seconds and after that the pgstat wait timeout messages
cease and vacuuming consequently resumes.

Now, it looks to me like for this stack trace to happen,
PgstatCollectorMain() has got to call pgwin32_waitforsinglesocket (at
line 3002), and that function has to return true, so that got_data
gets set to true. Then PgstatCollectorMain() will call recv(), which
on Windows will really be pgwin32_recv, which will call
pgwin32_waitforsinglesocket, which must now hang. The fact that the
first pgwin32_waitforsinglesocket call returned true should mean that
the stats collector socket is ready for read, while the fact that the
second one did not return seems to imply that it's not ready for read,
close, or accept. So it almost looks like Windows can change its mind
about whether the socket is readable.

Or maybe we're telling it to change its mind. This sounds an awful
lot like something that could have been caused by the oversights fixed
in commit b85427f2276d02756b558c0024949305ea65aca5. Was there a
reason we didn't back-patch that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2012-08-07 17:11:15 Re: BUG #6738: pg_dump does not handle extensions properly/invalid pg_dump output
Previous Message Magnus Hagander 2012-08-07 13:07:36 Re: Error on pg_settings.bytea_output for pg9.1