Strange performance issue apparently causd by pg_stat timeout

From: "Benjamin Krajmalnik" <kraj(at)servoyant(dot)com>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: Strange performance issue apparently causd by pg_stat timeout
Date: 2010-06-03 03:48:35
Message-ID: F4E6A2751A2823418A21D4A160B6898861451E@fletch.stackdump.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

System is running PG 8.4.0 (I have been unable to upgrade because the
system needs to be up 24x7), FreeBSD 7.2 amd64, 8 cores, 16GB RAM.

Our application is a network monitoring system, so we are constantly
inserting vast amounts of data (server presently processes about 50
million transactions per day).

As digests of test points come in, they are stored in message queues on
a second server (running PG 8.4.4). A set of daemons process the
digests and insert the data into the main database residing on a second
server. Presently, the database has about 60GB of data.

A few days ago, I noticed that the data in the message queues on the
other server was getting backed up, and then after a few minutes it
would process and clear. This was a totally new behavior. Initially I
suspected deadlocks caused by background processes which create
materialized views, so I stopped those, yet the behavior continued.
Then I suspected server load, yet CPU utilization and load was fine (l
minute and 5 minute load was at < 4), and iostat did not show an overly
busy disk subsystem (I had seen it at much higher utilization levels on
both the data and log partitions without any performance hits).

I then suspected network issues, so I checked the infrastructure (runnig
on Juniper Gig switches, of course non-blocking, and all of the port
information was clean).

Finally, I noticed that the logs had "pg_stat wait timeout" warnings.
Initially I though these were caused by our checking the running process
via pgadmin from our office to the data center, yet even when I exited
pgadmin, the warnings were still there.

After further testing, I saw a correlation between the data getting
queued up and the "pg_stat wait timeout" warnings. As data would begin
to queue up, I would see the warnings, and about a minute later it would
start to dequeue and get stored on the server.

I searched the archives and found some messages stating that this has
been observed. The interesting thing is that nothing has changed on the
server and it started to manifest itself. I ran an analyze of the
entire database, hoping this may rectify any issue, but unfortunately to
no avail.

Any suggestions would be deeply appreciated - this behavior is
definitely not good.

Below is a snapshot from the log files:

2010-06-02 20:59:19 MDT WARNING: pgstat wait timeout

2010-06-02 21:10:42 MDT WARNING: pgstat wait timeout

2010-06-02 21:16:21 MDT WARNING: pgstat wait timeout

2010-06-02 21:21:23 MDT WARNING: pgstat wait timeout

2010-06-02 21:25:50 MDT WARNING: pgstat wait timeout

2010-06-02 21:35:20 MDT WARNING: pgstat wait timeout

2010-06-02 21:39:13 MDT WARNING: pgstat wait timeout

2010-06-02 21:45:32 MDT WARNING: pgstat wait timeout

Thanks in advance.

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Devrim GÜNDÜZ 2010-06-03 05:23:28 Re: pitrtools package wrong dependency (binary packages)
Previous Message Lacey Powers 2010-06-02 22:58:31 Re: pitrtools package wrong dependency (binary packages)