Re: shared_buffers/effective_cache_size on 96GB server

From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: Strahinja Kustudić <strahinjak(at)nordeus(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: shared_buffers/effective_cache_size on 96GB server
Date: 2012-10-10 13:09:56
Message-ID: 507573A4.1090605@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 10/10/2012 02:12 AM, Strahinja Kustudić wrote:

> total used free shared buffers cached
> Mem: 96730 96418 311 0 71 93120

Wow, look at all that RAM. Something nobody has mentioned yet, you'll
want to set some additional kernel parameters for this, to avoid getting
occasional IO storms caused by dirty memory flushes.

vm.dirty_background_ratio = 1
vm.dirty_ratio = 5

Again, these would go in sysctl.conf, or /etc/sysctl.d/10-dbserver.conf
or something. If you have a newer kernel, look into
vm.dirty_background_bytes, and vm.dirty_bytes.

The why of this is brought up occasionally here, but it comes down to
your vast amount of memory. The defaults for even late Linux kernels is
5% for dirty_background_ratio, and 10% for dirty_ratio. So if you
multiply it out, the kernel will allow about 4.8GB of dirty memory
before attempting to flush it to disk. If that number reaches 9.6, the
system goes synchronous, and no other disk writes can take place until
*all 9.6GB* is flushed. Even with a fast disk subsystem, that's a pretty
big gulp.

The idea here is to keep it writing in the background by setting a low
limit, so it never reaches a critical mass that causes it to snowball
into the more dangerous upper limit. If you have a newer kernel, the
ability to set "bytes" is a much more granular knob that can be used to
match RAID buffer sizes. You'll probably want to experiment with this a
bit before committing to a setting.

> So it did a little swapping, but only minor, still I should probably
> decrease shared_buffers so there is no swapping at all.

Don't worry about that amount of swapping. As others have said here, you
can reduce that to 0, and even then, the OS will still swap something
occasionally. It's really just a hint to the kernel how much swapping
you want to go on, and it's free to ignore it in cases where it knows
some data won't be accessed after initialization or something, so it
swaps it out anyway.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

From pgsql-performance-owner(at)postgresql(dot)org Wed Oct 10 14:36:05 2012
Received: from magus.postgresql.org ([87.238.57.229])
by malur.postgresql.org with esmtp (Exim 4.72)
(envelope-from <strahinjak(at)nordeus(dot)eu>)
id 1TLxOG-0003Fm-I1
for pgsql-performance(at)postgresql(dot)org; Wed, 10 Oct 2012 14:36:04 +0000
Received: from mail-vc0-f174.google.com ([209.85.220.174])
by magus.postgresql.org with esmtp (Exim 4.72)
(envelope-from <strahinjak(at)nordeus(dot)eu>)
id 1TLxOC-0004Nm-St
for pgsql-performance(at)postgresql(dot)org; Wed, 10 Oct 2012 14:36:04 +0000
Received: by mail-vc0-f174.google.com with SMTP id fo13so808701vcb.19
for <pgsql-performance(at)postgresql(dot)org>; Wed, 10 Oct 2012 07:35:59 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=google.com; s 120113;
h=mime-version:in-reply-to:references:from:date:message-id:subject:to
:cc:content-type:x-gm-message-state;
bh=/oNm5MPlpH/42UuyiReKtX0VQauAZKtNWHEGV6VHGKg=;
b=ohVQ0HjqDvgsfQCK71QuYo4DcuM+m0XvzsNYzxdPbbznbPQGC4aLVaSSRse05ZHRQd
9Nm4i/DBCQo08oJsA9fPAIlkaXSqXeQOoGjUCLtm//vXcYX6gpdKyLssZ8kLZVyaPPkx
L0sHmVi2OfOp2p/1q32UtFa44agIad/Qq/x1Ks35Y18qQXJvqgGf3SWQIG9lPwxlevrT
d9XYPXqdSWFL4LXG4dIwBlgLGjjLP4lnY6+o1SUnDX344hngwFrx0mbkt4IiKNnaNyLj
e68/nY5fUq/zOW29hgLskFetCE+EqSSEkvTolp5MZo7n5AWfOViOE3RE78gTIF8tKnGn
mhYQ=
Received: by 10.221.1.81 with SMTP id np17mr13762046vcb.66.1349879759177; Wed,
10 Oct 2012 07:35:59 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.58.77.66 with HTTP; Wed, 10 Oct 2012 07:35:39 -0700 (PDT)
In-Reply-To: <507573A4(dot)1090605(at)optionshouse(dot)com>
References: <CADKbJJUTr9ZWkbajR-7_6qaAsWH0 *Mo67C_Az-raKZE4w(at)mail(dot)gmail(dot)com>
<507573A4(dot)1090605(at)optionshouse(dot)com>
From: =?ISO-8859-2?Q?Strahinja_Kustudi?= <strahinjak(at)nordeus(dot)com>
Date: Wed, 10 Oct 2012 16:35:39 +0200
Message-ID: <CADKbJJXNBLqDFwj3gnaZXdqU1sRU1qRLz=jJnyEWhUQJCC5vNQ(at)mail(dot)gmail(dot)com>
Subject: Re: shared_buffers/effective_cache_size on 96GB server
To: sthomas(at)optionshouse(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Content-Type: multipart/alternative; boundaryaec54ee6ac8908dd04cbb55f86
X-Gm-Message-State: ALoCoQmxn82BHe1i9n4Iy/eK3LgdnywqeX9TJ7v08DSjwdeZ9F55Tmvb+n4QDOu1vTYWGRnbZJSe
X-Pg-Spam-Score: -2.6 (--)
X-Archive-Number: 201210/111
X-Sequence-Number: 48070

--bcaec54ee6ac8908dd04cbb55f86
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Shaun,

running these commands:

#sysctl vm.dirty_ratio
vm.dirty_ratio = 40
# sysctl vm.dirty_background_ratio
vm.dirty_background_ratio = 10

shows that these values are even higher by default. When you said RAID
buffer size, you meant the controllers cache memory size?

Regards,
Strahinja

On Wed, Oct 10, 2012 at 3:09 PM, Shaun Thomas <sthomas(at)optionshouse(dot)com>wrote:

> On 10/10/2012 02:12 AM, Strahinja Kustudić wrote:
>
> total used free shared buffers cached
>> Mem: 96730 96418 311 0 71 93120
>>
>
> Wow, look at all that RAM. Something nobody has mentioned yet, you'll want
> to set some additional kernel parameters for this, to avoid getting
> occasional IO storms caused by dirty memory flushes.
>
> vm.dirty_background_ratio = 1
> vm.dirty_ratio = 5
>
> Again, these would go in sysctl.conf, or /etc/sysctl.d/10-dbserver.conf or
> something. If you have a newer kernel, look into vm.dirty_background_bytes,
> and vm.dirty_bytes.
>
> The why of this is brought up occasionally here, but it comes down to your
> vast amount of memory. The defaults for even late Linux kernels is 5% for
> dirty_background_ratio, and 10% for dirty_ratio. So if you multiply it out,
> the kernel will allow about 4.8GB of dirty memory before attempting to
> flush it to disk. If that number reaches 9.6, the system goes synchronous,
> and no other disk writes can take place until *all 9.6GB* is flushed. Even
> with a fast disk subsystem, that's a pretty big gulp.
>
> The idea here is to keep it writing in the background by setting a low
> limit, so it never reaches a critical mass that causes it to snowball into
> the more dangerous upper limit. If you have a newer kernel, the ability to
> set "bytes" is a much more granular knob that can be used to match RAID
> buffer sizes. You'll probably want to experiment with this a bit before
> committing to a setting.
>
>
> So it did a little swapping, but only minor, still I should probably
>> decrease shared_buffers so there is no swapping at all.
>>
>
> Don't worry about that amount of swapping. As others have said here, you
> can reduce that to 0, and even then, the OS will still swap something
> occasionally. It's really just a hint to the kernel how much swapping you
> want to go on, and it's free to ignore it in cases where it knows some data
> won't be accessed after initialization or something, so it swaps it out
> anyway.
>
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas(at)optionshouse(dot)com
>
> ______________________________**________________
>
> See http://www.peak6.com/email_**disclaimer/<http://www.peak6.com/email_disclaimer/>for terms and conditions related to this email
>

--bcaec54ee6ac8908dd04cbb55f86
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Shaun,<br><br>running these commands:<br><br>#sysctl vm.dirty_ratio <br>vm.dirty_ratio = 40<br># sysctl vm.dirty_background_ratio <br>vm.dirty_background_ratio = 10<br><br>shows that these values are even higher by default. When you said RAID buffer size, you meant the controllers cache memory size?<br>

<br>Regards,<br>Strahinja<br><span><font color="#888888"><div><div style="border-collapse:collapse;font-size:13px"><font face="tahoma, sans-serif"><font color="#3366ff">
</font><span style="color:rgb(102,102,102)"></span></font></div>
</div></font></span>
<br><br><div class="gmail_quote">On Wed, Oct 10, 2012 at 3:09 PM, Shaun Thomas <span dir="ltr">&lt;<a href="mailto:sthomas(at)optionshouse(dot)com" target="_blank">sthomas(at)optionshouse(dot)com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div>On 10/10/2012 02:12 AM, Strahinja Kustudić wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          total    used   free  shared   buffers     cached<br>
Mem:      96730   96418    311       0        71      93120<br>
</blockquote>
<br></div>
Wow, look at all that RAM. Something nobody has mentioned yet, you&#39;ll want to set some additional kernel parameters for this, to avoid getting occasional IO storms caused by dirty memory flushes.<br>
<br>
vm.dirty_background_ratio = 1<br>
vm.dirty_ratio = 5<br>
<br>
Again, these would go in sysctl.conf, or /etc/sysctl.d/10-dbserver.conf or something. If you have a newer kernel, look into vm.dirty_background_bytes, and vm.dirty_bytes.<br>
<br>
The why of this is brought up occasionally here, but it comes down to your vast amount of memory. The defaults for even late Linux kernels is 5% for dirty_background_ratio, and 10% for dirty_ratio. So if you multiply it out, the kernel will allow about 4.8GB of dirty memory before attempting to flush it to disk. If that number reaches 9.6, the system goes synchronous, and no other disk writes can take place until *all 9.6GB* is flushed. Even with a fast disk subsystem, that&#39;s a pretty big gulp.<br>

<br>
The idea here is to keep it writing in the background by setting a low limit, so it never reaches a critical mass that causes it to snowball into the more dangerous upper limit. If you have a newer kernel, the ability to set &quot;bytes&quot; is a much more granular knob that can be used to match RAID buffer sizes. You&#39;ll probably want to experiment with this a bit before committing to a setting.<div>

<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
So it did a little swapping, but only minor, still I should probably<br>
decrease shared_buffers so there is no swapping at all.<br>
</blockquote>
<br></div>
Don&#39;t worry about that amount of swapping. As others have said here, you can reduce that to 0, and even then, the OS will still swap something occasionally. It&#39;s really just a hint to the kernel how much swapping you want to go on, and it&#39;s free to ignore it in cases where it knows some data won&#39;t be accessed after initialization or something, so it swaps it out anyway.<span><font color="#888888"><br>

<br>
<br>
-- <br>
Shaun Thomas<br>
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604<br>
312-444-8534<br>
<a href="mailto:sthomas(at)optionshouse(dot)com" target="_blank">sthomas(at)optionshouse(dot)com</a><br>
<br>
______________________________<u></u>________________<br>
<br>
See <a href="http://www.peak6.com/email_disclaimer/" target="_blank">http://www.peak6.com/email_<u></u>disclaimer/</a> for terms and conditions related to this email<br>
</font></span></blockquote></div><br>

--bcaec54ee6ac8908dd04cbb55f86--

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2012-10-10 14:38:15 Re: shared_buffers/effective_cache_size on 96GB server
Previous Message Shaun Thomas 2012-10-10 12:52:37 Re: Hyperthreading (was: Two identical systems, radically different performance)