Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)

From: Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Yeb Havinga <yebhavinga(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Date: 2010-06-26 13:22:42
Message-ID: AANLkTilCLpvv86RX2gbKp39zhFcMzka4gQIdfw657Ryz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dear List ,

A simple (perl) script was made to 'watch' the state transitions of
back ends. On startup It captures a set of pids for watching
and displays a visual representation of the states for next 30 intervals
of 1 seconds each. The X axis is interval cnt, Y axis is pid and the
origin is on top-left.

The state value can be Active Query (*) , or <IDLE> indicated by '.' or
'<IDLE> in transaction' indicated by '?' . for my server below is a random
output (during lean hours and on a lean day).

----------------------------------------------------------------------------------------------------
PID 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30
----------------------------------------------------------------------------------------------------
4334 ? ? ? ? * ? ? ? ? ? * ? ? ? ? ? ? ? ?
6904 ? ? . . . * ? . . . . . . ? ? .
6951 ? ? ? . . . . ? ? ? ? ? . . . ? ? ? . . . ? .
. . . . ? ? .
7009 ? * ? ? . . . . . . . . . * * . * ? ? . . . *
? ? ? . . . ?
7077 ? . ? . . . * . ? . . . . ? . . . . . . . ? .
. ? . . . ? ?
7088 ? . . ? . ? ? ? . . . . . . ? . . ? ? * . . .
. . ? . ? . *
7091 ? . . * ? ? ? ? ? ? ? * ? . . ? * . * . . . .
. . . . . . .
7093 ? ? . ? . . . . ? . ? * . . . . . . . . . ? ?
? . ? ? ? . .
7112 * * . . . ? ? ? . . . . . . . . ? ? . ? . ? .
. ? . . . . .
7135 ? . . * . ? ? ? . ? ? . . . ? . . . . . . . ?
. . . ? ? . .
7142 ? . ? . . . . . . * . . . ? . . . . . . . . .
. . . . .
7166 ? . ? ? ? * * . ? * . ? . . . ? . ? ? . . . *
. . . ? . . .
8202 ? ? . . . * . ? . . . . . . . * ? . . . ? ? .
. . . ? ? ? .
8223 ? . . . . . . ?
8237 ? ? ? . ? ? ? ? . . . . . . . ? . . . . . ? .
. * ? . . . .
8251 ? . ? . . . . . ? ? . . . * ? . . . ? . . . .
. . . . . . .
8278 ? ? . . . . ? . . . . . . . ? . . . . . . ? ?
. . * . . . .
8290 ? . .
8294 ? ? . . . . . . . . . . . . ? . . . ? ? . . .
. . . . . * *
8303 ? * ? . ? ? ? . ? ? ? . . . . * . . . . . . .
. . . . . . .
8306 ? ? . . . ? . . . ? . . . . . . * . . .
8309 * ? ? ? ? . . . ? . . .
8329 ? . * * . . . . . . . * . ? . * . ? . * . * ?
. . .
----------------------------------------------------------------------------------------------------
(*) Active Query , (.) Idle , (?) Idle in transaction,<blank> backend
over.
----------------------------------------------------------------------------------------------------

Looks like most of the graph space is filled with (.) or (?) and very
less active queries (long running queries > 1s). on a busy day and busi hour
i shall check the and post again. The script is presented which depends only
on perl , DBI and DBD::Pg.

script pasted here:
http://pastebin.com/mrjSZfLB

Regds
mallah.

On Sat, Jun 26, 2010 at 3:23 PM, Rajesh Kumar Mallah <
mallah(dot)rajesh(at)gmail(dot)com> wrote:

> Dear List,
>
> Today has been good since morning. Although it is a lean day
> for us but the indications are nice. I thank everyone who shared
> the concern. I think the most significant change has been to reduce
> shared_buffers from 10G to 4G , this has lead to reduced memory
> usage and some breathing space to the OS.
>
> Although i am yet to incorporate the suggestions from pgtune but
> i think the issue of max_connection needs to be addressed first.
>
> I am investigating application issues and about the mechanism that
> puts many backend to '<IDLE> in transaction ' mode for significant
> times. I thank Tom for the script he sent. Once that resolves i shall
> check pooling as suggested by Kevin, then eventually max_connections
> can be reduced. I shall also check pgpool and pgbouncer if they are
> helpful in this regard.
>
> I observed that the number of simultaneous connection today (lean day)
> hovers between 1 to 10 , occasionally shooting to 15 but never more than
> 20 i would say.
>
>
> I am happy that i/o waits are negligible and cpu is idling also for a
> while.
>
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
> r b swpd free buff cache si so bi bo in cs us sy
> id wa st
> 22 0 18468 954120 13460 28491772 0 0 568 1558 13645 18355 62 10
> 27 2 0
> 16 0 18468 208100 13476 28469084 0 0 580 671 14039 17055 67 13
> 19 1 0
> 10 2 18812 329032 13400 28356972 0 46 301 1768 13848 17884 68 10
> 20 1 0
> 16 2 18812 366596 13416 28361620 0 0 325 535 13957 16649 72 11
> 16 1 0
> 50 1 18812 657048 13432 28366548 0 0 416 937 13823 16667 62 9
> 28 1 0
> 6 1 18812 361040 13452 28371908 0 0 323 522 14352 16789 74 12
> 14 0 0
> 33 0 18812 162760 12604 28210152 0 0 664 1544 14701 16315 66 11
> 22 2 0
> 5 0 18812 212028 10764 27921800 0 0 552 648 14567 17737 67 10
> 21 1 0
> 6 0 18796 279920 10548 27890388 3 0 359 562 12635 15976 60 9
> 30 1 0
> 8 0 18796 438820 10564 27894440 0 0 289 2144 12234 15770 57 8
> 34 1 0
> 5 0 18796 531800 10580 27901700 0 0 514 394 12169 16005 59 8
> 32 1 0
> 17 0 18796 645868 10596 27890704 0 0 423 948 13369 16554 67 10
> 23 1 0
> 9 1 18796 1076540 10612 27898604 0 0 598 403 12703 17363 71 10
> 18 1 0
> 8 0 18796 1666508 10628 27904748 0 0 430 1123 13314 17421 57 9
> 32 1 0
> 9 1 18776 1541444 10644 27913092 1 0 653 954 13194 16822 75 11
> 12 1 0
> 8 0 18776 1526728 10660 27921380 0 0 692 788 13073 16987 74 9
> 15 1 0
> 8 0 18776 1482304 10676 27933176 0 0 966 2029 13017 16651 76 12
> 11 1 0
> 21 0 18776 1683260 10700 27937492 0 0 298 663 13110 15796 67 10
> 23 1 0
> 18 0 18776 2087664 10716 27943512 0 0 406 622 12399 17072 62 9
> 28 1 0
>
>
> With 300 connections, I think that either of these could lead you to
>> experience intermittent bursts of extreme swapping. I'd drop it to
>> somewhere in the 16MB to 32MB range until I had a connection pool
>> configured such that it was actually keeping the number of active
>> connections much lower.
>>
>> > (*) wal_buffers = 8MB # pgtune wizard 2010-06-25
>> > (64kb , via default)
>>
>> Sure, I'd boost this.
>>
>> > checkpoint_segments = 16 # pgtune wizard 2010-06-25
>> > (30 , specified)
>>
>> If you have the disk space for the 30 segments, I wouldn't reduce
>> it.
>>
>> > shared_buffers = 7680MB # pgtune wizard 2010-06-25
>> > (4096 MB , specified)
>>
>> This one is perhaps the most sensitive to workload. Anywhere
>> between 1GB and 8GB might be best for you. Greg Smith has some
>> great advice on how to tune this for your workload.
>>
>> > (*) max_connections = 80 # pgtune wizard 2010-06-25
>> > (300 , ;-) specified)
>> >
>> > when i reduce max_connections i start getting errors, i will see
>> > again concurrent connections during business hours.
>>
>> That's probably a good number to get to, but you have to reduce the
>> number of actual connections before you set the limit that low.
>>
>> > lot of our connections are in <IDLE> in transaction state
>>
>> If any of these stay in that state for more than a minute or two,
>> you need to address that if you want to get your connection count
>> under control. If any of them persist for hours or days, you need
>> to fix it to avoid bloat which can kill performance.
>>
>> -Kevin
>>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2010-06-26 15:49:27 Re: Architecting a database
Previous Message Rajesh Kumar Mallah 2010-06-26 09:53:43 Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)