Skip site navigation (1) Skip section navigation (2)

Random performance hit, unknown cause.

From: Brian Fehrle <brianf(at)consistentstate(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Random performance hit, unknown cause.
Date: 2012-04-12 18:41:08
Message-ID: 4F8721C4.8090300@consistentstate.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hi all,

OS: Linux 64 bit 2.6.32
PostgreSQL 9.0.5 installed from Ubuntu packages.
8 CPU cores
64 GB system memory
Database cluster is on raid 10 direct attached drive, using a HP p800 
controller card.


I have a system that has been having occasional performance hits, where 
the load on the system skyrockets, all queries take longer to execute 
and a hot standby slave I have set up via streaming replication starts 
to get behind. I'm having trouble pinpointing where the exact issue is.

This morning, during our nightly backup process (where we grab a copy of 
the data directory), we started having this same issue. The main thing 
that I see in all of these is a high disk wait on the system. When we 
are performing 'well', the %wa from top is usually around 30%, and our 
load is around 12 - 15. This morning we saw a load  21 - 23, and an %wa 
jumping between 60% and 75%.

The top process pretty much at all times is the WAL Sender Process, is 
this normal?

 From what I can tell, my access patterns on the database has not 
changed, same average number of inserts, updates, deletes, and had 
nothing on the system changed in any way. No abnormal autovacuum 
processes that aren't normally already running.

So what things can I do to track down what an issue is? Currently the 
system has returned to a 'good' state, and performance looks great. But 
I would like to know how to prevent this, as well as be able to grab 
good stats if it does happen again in the future.

Has anyone had any issues with the HP p800 controller card in a postgres 
environment? Is there anything that can help us maximise the performance 
to disk in this case, as it seems to be one of our major bottlenecks? I 
do plan on moving the pg_xlog to a separate drive down the road, the 
cluster is extremely active so that will help out a ton.

some IO stats:

$ iostat -d -x 5 3
Device:        rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
dev1            1.99    75.24  651.06  438.04 41668.57  8848.18    
46.38     0.60    3.68   0.70  76.36
dev2            0.00     0.00  653.05  513.43 41668.57  8848.18    
43.31     2.18    4.78   0.65  76.35

Device:        rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
dev1            0.00    35.20  676.20  292.00 35105.60  5688.00    
42.13    67.76   70.73   1.03 100.00
dev2            0.00     0.00  671.80  295.40 35273.60  4843.20    
41.48    73.41   76.62   1.03 100.00

Device:        rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
dev1            1.20    40.80  865.40  424.80 51355.20  8231.00    
46.18    37.87   29.22   0.77  99.80
dev2            0.00     0.00  867.40  465.60 51041.60  8231.00    
44.47    38.28   28.58   0.75  99.80

Thanks in advance,
Brian F

Responses

pgsql-performance by date

Next:From: Claudio FreireDate: 2012-04-12 18:49:43
Subject: Re: Random performance hit, unknown cause.
Previous:From: Steve CrawfordDate: 2012-04-12 15:47:58
Subject: Re: Linux machine aggressively clearing cache

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group