Re: Server hitting 100% CPU usage, system comes to a crawl.

From: Brian Fehrle <brianf(at)consistentstate(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Server hitting 100% CPU usage, system comes to a crawl.
Date: 2011-10-27 21:22:33
Message-ID: 4EA9CB99.8090808@consistentstate.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/27/2011 02:50 PM, Tom Lane wrote:
> Brian Fehrle<brianf(at)consistentstate(dot)com> writes:
>> Hi all, need some help/clues on tracking down a performance issue.
>> PostgreSQL version: 8.3.11
>> I've got a system that has 32 cores and 128 gigs of ram. We have
>> connection pooling set up, with about 100 - 200 persistent connections
>> open to the database. Our applications then use these connections to
>> query the database constantly, but when a connection isn't currently
>> executing a query, it's<IDLE>. On average, at any given time, there are
>> 3 - 6 connections that are actually executing a query, while the rest
>> are<IDLE>.
>> About once a day, queries that normally take just a few seconds slow way
>> down, and start to pile up, to the point where instead of just having
>> 3-6 queries running at any given time, we get 100 - 200. The whole
>> system comes to a crawl, and looking at top, the CPU usage is 99%.
> This is jumping to a conclusion based on insufficient data, but what you
> describe sounds a bit like the sinval queue contention problems that we
> fixed in 8.4. Some prior reports of that:
> http://archives.postgresql.org/pgsql-performance/2008-01/msg00001.php
> http://archives.postgresql.org/pgsql-performance/2010-06/msg00452.php
>
> If your symptoms match those, the best fix would be to update to 8.4.x
> or later, but a stopgap solution would be to cut down on the number of
> idle backends.
>
> regards, tom lane
That sounds somewhat close to the same issue I am seeing. Main
differences being that my spike lasts for much longer than a few
minutes, and can only be resolved when the cluster is restarted. Also,
that second link shows TOP where much of the CPU is via the 'user',
rather than the 'sys' like mine.

Is there anything I can look at more to get more info on this 'sinval
que contention problem'?

Also, having my cpu usage high in 'sys' rather than 'us', could that be
a red flag? Or is that normal?

- Brian F

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Josh Berkus 2011-10-27 22:06:31 PostgreSQL at LISA in Boston: Dec. 7-8
Previous Message Tom Lane 2011-10-27 21:18:06 Re: Getting X coordinate from a point(lseg), btw i read the man page about points.