Re: random slow query

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Mike Ivanov <mikei(at)activestate(dot)com>
Cc: Sean Ma <seanxma(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: random slow query
Date: 2009-06-30 17:46:16
Message-ID: dcc563d10906301046g449903c6i2716770961481f09@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Jun 30, 2009 at 11:23 AM, Mike Ivanov<mikei(at)activestate(dot)com> wrote:
> Hi Sean,
>
> Well, the overall impression is your machine is badly overloaded. Look:
>
>> top - 10:18:58 up 224 days, 15:10,  2 users,  load average: 6.27, 7.33, 6
>>
>
> The load average of 6.5 means there are six and a half processes competing
> for the same CPU (and this system apparently has only one). This
> approximately equals to 500% overload.
>
> Recommendation: either add more CPU's or eliminate process competition by
> moving them to other boxes.

Well, we can't be sure OP's only got one core. However, given that
the OPs posting shows mostly idle and wait state, the real issue isn't
the number of cores, it's the IO subsystem is too slow for the load.
More cores wouldn't fix that.

>> Tasks: 239 total,   1 running, 238 sleeping,   0 stopped,   0 zombie
>>
>
> This supports what I said above. There are only 92 processes running on my
> laptop and I think it is too much. Do you have Apache running on the same
> machine?

My production PG server that runs ONLY pg has 222 processes on it.
It's no big deal. Unless they're all trying to get cpu time, which
generally isn't the case.

>> Cpu(s):  5.0%us,  0.7%sy,  0.0%ni, 61.5%id, 32.7%wa,  0.0%hi,  0.1%si,  0
>>
>
> Waiting time (wa) is rather high, which means processes wait on locks or for
> IO, another clue for concurrency issues on this machine.

More likely just a slow IO subsystem. Like a single drive or
something. adding drives in a RAID-1 or RAID-10 etc usually helps.

>> Mem:  32962804k total, 32802612k used,   160192k free,   325360k buffers
>>
>
> Buffers are about 10% of all the memory which is OK, but I tend to give
> buffers some more room.

This is kernel buffers, not pg buffers. It's set by the OS
semi-automagically. In this case it's 325M out of 32 Gig, so it's
well under 10%, which is typical.

>> Swap:  8193140k total,   224916k used,  7968224k free, 30829456k cached
>>
>
> 200M paged out. It should be zero except of an emergency.

Not true. Linux will happily swap out seldom used processes to make
room in memory for more kernel cache etc. You can adjust this
tendency by setting swappiness.

> 3G of cached swap
> is a sign of some crazy paging activity in thepast. Those unexplainable
> slowdowns are very likely caused by that.

No, they're not. It's 30G btw, and it's not swap that's cached, it's
the kernel using extra memory to cache data to / from the hard drives.
It's normal, and shouldn't worry anybody. In fact it's a good sign
that you're not using way too much memory for any one process.

>> Didn't really see the pattern, typical the cpu load is only about 40%
>>
>
> 40% is too much, really. I start worrying when it is above 10%.

Really? I have eight cores on my production servers and many batch
jobs I run put all 8 cores at 90% for extended periods. Since that
machine is normally doing a lot of smaller cached queries, it hardly
even notices.

> Conclusion:
>
> - the system bears more load than it can handle

Yes, too much IO load. I agree on that.

> - the machine needs an upgrade

Yes, more hard drives / better caching RAID controller.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Sean Ma 2009-06-30 17:49:24 Re: random slow query
Previous Message Mike Ivanov 2009-06-30 17:23:35 Re: random slow query