Re: Server hitting 100% CPU usage, system comes to a crawl.

From: Brian Fehrle <brianf(at)consistentstate(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Server hitting 100% CPU usage, system comes to a crawl.
Date: 2011-10-27 20:15:25
Message-ID: 4EA9BBDD.5050109@consistentstate.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Also, I'm not having any issue with the database restarting itself,
simply becoming unresponsive / slow to respond, to the point where just
sshing to the box takes about 30 seconds if not longer. Performing a
pg_ctl restart on the cluster resolves the issue.

I looked through the logs for any segmentation faults, none found. In
fact the only thing in my log that seems to be 'bad' are the following.

Oct 27 08:53:18 <snip> postgres[17517]: [28932839-1]
user=<snip>,db=<snip> ERROR: deadlock detected
Oct 27 11:49:22 <snip> postgres[608]: [19-1] user=<snip>,db=<snip>
ERROR: could not serialize access due to concurrent update

I don't believe these occurred too close to the slowdown.

- Brian F

On 10/27/2011 02:09 PM, Brian Fehrle wrote:
> On 10/27/2011 01:48 PM, Scott Marlowe wrote:
>> On Thu, Oct 27, 2011 at 12:39 PM, Brian Fehrle
>> <brianf(at)consistentstate(dot)com> wrote:
>>> Looking at top, I see no SWAP usage, very little IOWait, and there
>>> are a
>>> large number of postmaster processes at 100% cpu usage (makes sense,
>>> at this
>>> point there are 150 or so queries currently executing on the database).
>>>
>>> Tasks: 713 total, 44 running, 668 sleeping, 0 stopped, 1 zombie
>>> Cpu(s): 4.4%us, 92.0%sy, 0.0%ni, 3.0%id, 0.0%wa, 0.0%hi, 0.3%si,
>>> 0.2%st
>>> Mem: 134217728k total, 131229972k used, 2987756k free, 462444k
>>> buffers
>>> Swap: 8388600k total, 296k used, 8388304k free, 119029580k
>>> cached
>> OK, a few points. 1: You've got a zombie process. Find out what's
>> causing that, it could be a trigger of some type for this behaviour.
>> 2: You're 92% sys. That's bad. It means the OS is chewing up 92% of
>> your 32 cores doing something. what tasks are at the top of the list
>> in top?
>>
> Out of the top 50 processes in top, 48 of them are postmasters, one is
> syslog, and one is psql. Each of the postmasters have a high %CPU, the
> top ones being 80% and higher, the rest being anywhere between 30% -
> 60%. Would postmaster 'queries' that are running attribute to the sys
> CPU usage, or should they be under the 'us' CPU usage?
>
>
>> Try running vmstat 10 for a a minute or so then look at cs and int
>> columns. If cs or int is well over 100k there could be an issue with
>> thrashing, where your app is making some change to the db that
>> requires all backends to be awoken at once and the machine just falls
>> over under the load.
>
> We've restarted the postgresql cluster, so the issue is not happening
> at this moment. but running a vmstat 10 had my 'cs' average at 3K and
> 'in' averaging around 9.5K.
>
> - Brian F

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Mead 2011-10-27 20:27:19 Re: Server hitting 100% CPU usage, system comes to a crawl.
Previous Message Brian Fehrle 2011-10-27 20:09:51 Re: Server hitting 100% CPU usage, system comes to a crawl.