Skip site navigation (1) Skip section navigation (2)

BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram

From: "Luke Koops" <luke(dot)koops(at)entrust(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Date: 2009-07-31 14:59:56
Message-ID: (view raw or flat)
Lists: pgsql-bugs
The following bug has been logged online:

Bug reference:      4958
Logged by:          Luke Koops
Email address:      luke(dot)koops(at)entrust(dot)com
PostgreSQL version: 8.3.7
Operating system:   Windows 2003 EE SP2
Description:        Stats collector hung on WaitForMultipleObjectsEx while
attempting to recv a datagram

While running a load test with 80 processes connected to postgres, of which
62 were generating load, the stats collector got blocked on
WaitForMultipleObjectsEx which was called with INFINITE timeout.  The stack
trace for the hung thread follows:

-- postgres.exe!mainCRTStartup --

Leading up to the hang, the DB was not under high load.  The CPU was running
at 80%, but less than a fifth of that was due to PostgreSQL.  The underlying
disk subsystems were routinely 90% idle.  The txlogs are on a 4 disk RAID-10
array and the data is on a 5 disk RAID-5 array (15K RPM).  This is a
performance test environment for a product that will include PostgreSQL as
an embedded database.

I have the debug symbols for this build, and a process dump of the stats

I have have not been able to reproduce this problem.  It seems to be
relatively rare, so I'm lucky to have seen it before any of our customers.

Even if the problem cannot be reproduced reliability, I think there is
enough information that would help the community to make corrections in the

After filtering out everything that is expected output, there was nothing
informative in the postgres logs (from 24 hours prior to the hang until 8
hours after).

We are accessing the DB over ODBC.  All of our processes are running on the
same server as PostgreSQL.

The DB is 52GB in size, so I cannot targzupload it.  I estimate that 4-8 GB
of the size is due to the lack of autovacuum after stats collection stopped.
 I doubt that this issue is due to the schema or contents of the DB.

I have verified, using TDIMon, that the back ends are sending the
statistics.  That was a bit redundant since the collector is clearly hung.

Here are the settings in my system that differ from the defaults:
	logging_collector = on
	log_line_prefix = '%m,'
	autovacuum_naptime = 10s
	autovacuum_analyze_scale_factor = 0.05
	autovacuum_vacuum_scale_factor = 0.1
	autovacuum_vacuum_cost_limit = 1000
	log_autovacuum_min_duration = 0

I have customized the vacuum rules for a few "hot" tables that are limited
to 1 or a few rows, but are frequently updated (whenever the system does
anything).  Here is a dump of pg_autovacuum:

 vacrelid | enabled | vac_base_thresh | vac_scale_factor | anl_base_thresh |
anl_scale_factor | vac_
cost_delay | vac_cost_limit | freeze_min_age | freeze_max_age
    16507 | t       |               1 |               -1 |               1 |
              -1 |
         0 |         100000 |             -1 |             -1
    16511 | t       |               1 |               -1 |               1 |
              -1 |
         0 |         100000 |             -1 |             -1
    16541 | t       |               1 |               -1 |               1 |
              -1 |
         0 |         100000 |             -1 |             -1
(3 rows)

I created a new cluster on the same system and the stats collector worked
just fine.


pgsql-bugs by date

Next:From: wader2Date: 2009-07-31 15:00:10
Subject: BUG #4959: unable to install/start service
Previous:From: Tom LaneDate: 2009-07-31 14:09:41
Subject: Re: BUG #4956: Array Construct array() returning blank result

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group