Re: High CPU load on Postgres Server during Peak times!!!!

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Shiva Raman <raman(dot)shivag(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: High CPU load on Postgres Server during Peak times!!!!
Date: 2009-09-22 14:18:11
Message-ID: b42b73150909220718y3dd8f0e4n3971596a0f6ef813@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Sep 22, 2009 at 9:54 AM, Shiva Raman <raman(dot)shivag(at)gmail(dot)com> wrote:
> Dear all
>
>   I am having a problem of high cpu loads in my postgres server during peak
> time. Following are the
> details of my setup (details as per the postgres wiki) .
>
> * PostgreSQL version
>          o Run "select pg_version();" in psql or PgAdmin III and provide the
> full, exact output.
>
>
> clusternode2:~ # rpm -qa | grep postgres
> postgresql-devel-8.1.9-1.2
> postgresql-8.1.9-1.2
> postgresql-docs-8.1.9-1.2
> postgresql-server-8.1.9-1.2
> postgresql-libs-64bit-8.1.9-1.2
> postgresql-libs-8.1.9-1.2
> postgresql-jdbc-8.1-12.2
> postgresql-contrib-8.1.9-1.2
>
>
> * A description of what you are trying to achieve and what results you
> expect.
>
> To keep the CPU Load below 10 , Now during peak times the load is nearing to
> 40
> At that time , it is not possible to access the data.
>
>    * The EXACT text of the query you ran, if any
>
>
>    * The EXACT output of that query if it's short enough to be reasonable to
> post
>          o If you think the output is wrong, what you think should've been
> produced instead
>
>    * The EXACT error message you get, if there is one
>
> As of now , i am unable to locate the exact query, the load shoots up
> abnormally during
> peak time is the main problem .
>
>
>    * What program you're using to connect to PostgreSQL
>
>         Jakarta Tomcat - Struts with JSP
>
>
>    * What version of the ODBC/JDBC driver you're using, if any
>
> postgresql-jdbc-8.1-12.2
>
>    * What you were doing when the error happened / how to cause the error.
> Describe in as much detail as possible, step by step, including command
> lines, SQL output, etc.
>
> When certain tables with more than 3 lakh items are concurrently accessed by
> more than 300
> users, the CPU load shoots up .
>
>    * Is there anything remotely unusual in the PostgreSQL server logs?
>          o On Windows these are in your data directory. On a default
> PostgreSQL install that'll be in C:\Program Files\PostgreSQL\8.4\data\pg_log
> (assuming you're using 8.4)
>
> The log file /var/log/postgresql has no data .
>
>          o On Linux this depends a bit on distro, but you'll usually find
> them in /var/log/postgresql/.
>    * Operating system and version
>          o Linux users:
>                + Linux distro and version
>                + Kernel details (run "uname -a" on the terminal)
>
> SLES 10 SP3
> clusternode2:~ # uname -a
> Linux clusternode2 2.6.16.46-0.12-ppc64 #1 SMP Thu May 17 14:00:09 UTC 2007
> ppc64 ppc64 ppc64 GNU/Linux
>
>
>
>    * What kind of hardware you have.
>          o CPU manufacturer and model, eg "AMD Athlon X2" or "Intel Core 2
> Duo"
>          o Amount and size of RAM installed, eg "2GB RAM"
>
> High Availability Cluster with two IBM P Series Server and one DS4700
> Storage
>
> IBM P series P52A with 2-core 2.1 Ghz POWER5+ Processor Card , 36 MB L3
> Cache ,16 GB of RAM,
> 73.4 GB 10,000 RPM Ultra320 SCSI Drive for Operating System .
>
>
>
>          o Storage details (important for performance and corruption
> questions)
>                + Do you use a RAID controller? If so, what type of
> controller? eg "3Ware Escalade 8500-8"
>                      # Does it have a battery backed cache module?
>                      # Is write-back caching enabled?
>                + Do you use software RAID? If so, what software and what
> version? eg "Linux software RAID (md) 2.6.18-5-686 SMP mod_unload 686
> REGPARM gcc-4.1".
>                      # In the case of Linux software RAID you can get the
> details from the "modinfo md_mod" command
>                + Is your PostgreSQL database on a SAN?
>                      # Who made it, what kind, etc? Provide what details you
> can.
>                + How many hard disks are connected to the system and what
> types are they? You need to say more than just "6 disks". At least give
> maker, rotational speed and interface type, eg "6 15,000rpm Seagate SAS
> disks".
>                + How are your disks arranged for storage? Are you using
> RAID? If so, what RAID level(s)? What PostgreSQL data is on what disks /
> disk sets? What file system(s) are in use?
>                      # eg: "Two disks in RAID 1, with all PostgreSQL data
> and programs stored on one ext3 file system."
>                      # eg: "4 disks in RAID 5 holding the pg data directory
> on an ext3 file system. 2 disks in RAID 1 holding pg_clog, pg_xlog, the
> temporary tablespace, and the sort scratch space, also on ext3.".
>                      # eg: "Default Windows install of PostgreSQL"
>                + In case of corruption data reports:
>                      # Have you had any unexpected power loss lately?
>                      # Have you run a file system check? (chkdsk / fsck)
>                      # Are there any error messages in the system logs?
> (unix/linux: "dmesg", "/var/log/syslog" ; Windows: Event Viewer in Control
> Panel -> Administrative Tools )
>
>
> IBM SAN DS4700 Storage with Fibre Channel HDD (73.4 GB * 10)
> Two Partitions - 73.4 GB * 3 RAID 5 - 134 GB storage partitions (One holding
> Jakarata tomcat
> application server and other holding Postgresql Database) .
> Four Hard disk RAID 5 with ext3 file systems hold the pgdata on SAN .
> Hard disk rotational speed is 73 GB 15K IBM 2 GB Fibre channel
>
> No power loss, filesystem check also fine, No errors on /var/log/syslog
>
> Following is the output of TOP command during offpeak time.
>
>
> top - 18:36:56 up 77 days, 20:33,  1 user,  load average: 12.99, 9.22, 10.37
> Tasks: 142 total,  12 running, 130 sleeping,   0 stopped,   0 zombie
> Cpu(s): 46.1%us,  1.9%sy,  0.0%ni,  6.1%id,  3.0%wa,  0.0%hi,  0.1%si,
> 42.9%st
> Mem:  16133676k total, 13657396k used,  2476280k free,   450908k buffers
> Swap: 14466492k total,      124k used, 14466368k free, 11590056k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 22458 postgres  19   0 2473m 477m 445m R   40  3.0   0:15.49 postmaster
>
> 22451 postgres  15   0 2442m 447m 437m S   33  2.8   0:30.44 postmaster
>
> 22464 postgres  17   0 2443m 397m 383m R   28  2.5   0:13.78 postmaster
>
> 22484 postgres  16   0 2448m 431m 412m S   20  2.7   0:02.73 postmaster
>
> 22465 postgres  17   0 2440m 461m 449m R   15  2.9   0:03.52 postmaster
>
> 22452 postgres  16   0 2450m 727m 706m R   13  4.6   0:23.46 postmaster
>
> 22476 postgres  16   0 2437m 413m 405m S   13  2.6   0:06.11 postmaster
>
> 22485 postgres  16   0 2439m 230m 222m R    7  1.5   0:05.72 postmaster
>
> 22481 postgres  15   0 2436m 175m 169m S    7  1.1   0:04.44 postmaster
>
> 22435 postgres  17   0 2438m 371m 361m R    6  2.4   1:17.92 postmaster
>
> 22440 postgres  17   0 2445m 497m 483m R    5  3.2   1:44.50 postmaster
>
> 22486 postgres  17   0 2432m  84m  81m R    4  0.5   0:00.76 postmaster
>
>    3 root      34  19     0    0    0 R    0  0.0   1:47.50 ksoftirqd/0
>
> 4726 root      15   0 29540 8776 3428 S    0  0.1 140:02.98 X
>
> 24950 root      15   0     0    0    0 S    0  0.0   0:30.96 pdflush
>
>    1 root      16   0   812  316  280 S    0  0.0   0:13.29 init
>
>    2 root      RT   0     0    0    0 S    0  0.0   0:01.46 migration/0
>
>    4 root      RT   0     0    0    0 S    0  0.0   0:00.78 migration/1
>
>    5 root      34  19     0    0    0 S    0  0.0   1:36.79 ksoftirqd/1
>
>    6 root      RT   0     0    0    0 S    0  0.0   0:01.46 migration/2
>
>    7 root      34  19     0    0    0 R    0  0.0   1:49.83 ksoftirqd/2
>
>    8 root      RT   0     0    0    0 S    0  0.0   0:00.79 migration/3
>
>    9 root      34  19     0    0    0 S    0  0.0   1:38.18 ksoftirqd/3
>
>   10 root      10  -5     0    0    0 S    0  0.0   1:02.11 events/0
>
>   11 root      10  -5     0    0    0 S    0  0.0   1:03.27 events/1
>
>   12 root      10  -5     0    0    0 S    0  0.0   1:01.76 events/2
>
>   13 root      10  -5     0    0    0 S    0  0.0   1:02.29 events/3
>
>   14 root      10  -5     0    0    0 S    0  0.0   0:00.01 khelper
>
> 1016 root      10  -5     0    0    0 S    0  0.0   0:00.00 kthread
>
> 1054 root      10  -5     0    0    0 S    0  0.0   0:03.08 kblockd/0
>
> 1055 root      10  -5     0    0    0 S    0  0.0   0:02.83 kblockd/1
>
> 1056 root      10  -5     0    0    0 S    0  0.0   0:03.19 kblockd/2
>
>
>
>
> The CPU Load shoots upto 40 during peak time.
>
> Following is my postgresql.conf (without comments)
>
> hba_file = '/var/lib/pgsql/data/pg_hba.conf'
> listen_addresses = '*'
> port = 5432
> max_connections = 1800
> shared_buffers = 300000
> max_fsm_relations = 1000
> effective_cache_size = 200000
> log_destination = 'stderr'
> redirect_stderr = on
> log_rotation_age = 0
> log_rotation_size = 10240
> silent_mode = onlog_line_prefix = '%t %d %u '
> autovacuum = on
> datestyle = 'iso, dmy'
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'en_US.UTF-8'
> lc_numeric = 'en_US.UTF-8'
> lc_time = 'en_US.UTF-8'
>
> User Access
> Total Number of Users is 500
> Maximum number of Concurrent users will be 500 during peak time
> Off Peak time the maximum number of concurrent user will be around 150 to
> 200.
>
>
> Please let me know your suggestions to improve the performance.

The very first step is to determine if you are cpu bound or i/o bound.
You need to monitor top or vmstat during high load period and report
the results here. Is the DS4700 direct attached? Sometimes using a
SAN can throw the iowait numbers off a bit. I bet you are simply
underpowered in I/O department.

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Andy Colson 2009-09-22 14:19:46 Re: High CPU load on Postgres Server during Peak times!!!!
Previous Message Stef Telford 2009-09-22 14:05:30 Hunting Unused Indexes .. is it this simple ?