Re: BUG #6650: CPU system time utilization rising few times a day

From: Andrzej Krawiec <a(dot)krawiec(at)focustelecom(dot)pl>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6650: CPU system time utilization rising few times a day
Date: 2012-05-23 09:29:15
Message-ID: CAAy64HjYYVG4_Qu2or278fD2QMQo53B7hjUUxE28FoFr2LyqDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Cannot strace or gdb on a production system under heavy load (about 100
transactions per second).
It's in kernel space not user, so we are unable to anything at this
particular moment (sometimes even the ssh connection seems to hang for a
while).
We suspect neither autovacuum (although suspected primarily) nor regular
backend. It is system time. The question is: what's the reasone for that?
We've dug through system and postgres logs, cleared out most of the long
query problems, idle in transaction, optimized queries, vacuumed, reindexed
and such.
For a while it seemed like the particular kernel version is causing
majority of problems. We have downgraded to 2.6.32.-71.29.1.el6.x86_64 and
those problems went mostly! away. For few days we had no situations, but it
happened again.

Regards
--
Andrzej Krawiec

2012/5/22 Robert Haas <robertmhaas(at)gmail(dot)com>

> On Fri, May 18, 2012 at 5:09 AM, <a(dot)krawiec(at)focustelecom(dot)pl> wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 6650
> > Logged by: Andrzej Krawiec
> > Email address: a(dot)krawiec(at)focustelecom(dot)pl
> > PostgreSQL version: 8.4.11
> > Operating system: CentOS 6.0 - 2.6.32-220.13.1.el6.x86_64
> > Description:
> >
> > Primarily checked on PG 8.4.9 (same OS), problem also occurs. Few times a
> > day I get a situation where PostgreSQL stops running for 1-2 minutes.
> CPU is
> > running 99% in systime. IO is OK, only interrupts are extremely high
> (over
> > 100k). System operates on 2 x Xeon 10 Core, 128 GB RAM, raid 10. Does
> anyone
> > have any idea?
>
> Try using strace to figure out where all that system time is going.
> Sometimes the '-c' option is helpful.
>
> It might also be helpful to connect gdb to the process and get a
> backtrace, then continue, stop it again, get another backtrace.
> Repeat that a few times and send us the backtrace that occurs most
> frequently.
>
> Is it a regular backend that is eating all that CPU time, or an
> autovacuum worker?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Valentine Gogichashvili 2012-05-23 10:30:31 Re: BUG #6661: out-of-order XID insertion in KnownAssignedXids
Previous Message Andres Freund 2012-05-23 06:49:59 Re: BUG #6661: out-of-order XID insertion in KnownAssignedXids