Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram

From: Luke Koops <luke(dot)koops(at)entrust(dot)com>
To: 'Nikhil Sontakke' <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Date: 2009-08-06 15:45:02
Message-ID: A3144629B5AC714A8BF27806EBFA7057514622B8@sottexch7.corp.ad.entrust.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> Yeah it will be interesting to see if the collector starts
> functioning fine after the restart. That might hint that the
> kernel object representing the socket is maybe fine but would
> not prove conclusively that the issue is with PG code because
> the layer used by WaitForMultipleObjectsEx might have issues too.
This morning I planned to verify that stats collection was still not working before killing the stats collector and allowing it to restart. I had a psql session open from the previous day, but I closed it and tried to start a new session. I log each session and wanted to start a new log.

Now, I am unable to start a new psql session. I get this error on the client side:
| psql: could not send startup packet: Connection reset by peer (0x00002746/10054)
|
and this error on the server side:
| 2009-08-06 10:48:59.542 EDT,LOG: could not receive data from client: No connection could be made because the target machine actively refused it.
| 2009-08-06 10:48:59.542 EDT,LOG: incomplete startup packet

I didn't find too much in the archives about this. It's what happens if you just connect to 5432 (with telnet for example) and then drop the connection.

Occasionally (3-6% of the time), I get this on the client side:
| C:\postgres\bin>psql
| psql: server closed the connection unexpectedly
| This probably means the server terminated abnormally
| before or while processing the request.
and this on the server side:
| 2009-08-06 11:16:27.933 EDT,LOG: could not receive data from client: No connection could be made because the target machine actively refused it.

When this sequence happens, I can see a backend postgres.exe process start up and then exit very quickly. Note the absence of the "incomplete startup package" message.

Could it be related to the stats collector problem? The stats collector on this system has been hung for over 6 weeks, so the timing of this problem is quite delayed.

I have windbg on this system along with the source and the symbols, so I could look for anything in the debugger.

-Luke

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2009-08-06 15:47:08 Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Previous Message Kevin Grittner 2009-08-06 15:33:27 Re: BUG #4966: wrong password.....