Re: Configuring for maximum memory usage

From: "Scott Carey" <scott(at)richrelevance(dot)com>
To: jd(at)commandprompt(dot)com
Cc: "Christiaan Willemsen" <cwillemsen(at)technocon(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Configuring for maximum memory usage
Date: 2008-10-30 18:27:09
Message-ID: a1ec7d000810301127k40c3cb4x5db8df0b64d037ae@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Oct 30, 2008 at 9:55 AM, Joshua D. Drake <jd(at)commandprompt(dot)com>wrote:

> On Thu, 2008-10-30 at 09:46 -0700, Scott Carey wrote:
>
> >
> > Remember that PostgreSQL doesn't cache anything on its own so
> > if you do
> > want to hit disk it has to be in file cache.
> >
> > By my understanding, this is absolutely false. Postgres caches pages
> > from tables/indexes in shared_buffers. You can make this very large if
> > you wish.
>
> You can make it very large with a potentially serious performance hit.
> It is very expensive to manage large amounts of shared buffers. It can
> also nail your IO on checkpoint if you are not careful (even with
> checkpoint smoothing). You are correct that I did not explain what I
> meant very well because shared buffers are exactly that, shared
> buffers.

You can slam your I/O by havnig too large of either OS file cache or
shared_buffers, and you have to tune both.
In the case of large shared_buffers you have to tune postgres and especially
the background writer and checkpoints.
In the case of a large OS cache, you have to tune parameters to limit the
ammount of dirty pages there and force writes out smoothly.
Both layers attempt to delay writes for their own, often similar reasons,
and suffer when a large sync comes along with lots of dirty data.

Recent ZFS changes have been made to limit this, (
http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle)
in earlier ZFS versions, this is what usually killed databases -- ZFS in
some situations would delay writes too long (even if "long" is 5 seconds)
and get in trouble. This still has to be tuned well, combined with good
checkpoint tuning in Postgres as you mention. For Linux, there are similar
issues that have to be tuned on many kernels, or up to 40% of RAM can fill
with dirty pages not written to disk.

Letting the OS do it doesn't get rid of the problem, both levels of cache
share very similar issues with large sizes and dirty pages followed by a
sync.

The buffer cache in shared_buffers is a lot more efficient for large
scanning queries -- A select count(*) test will be CPU bound if it comes
from shared_buffers or the OS page cache, and in the former case I have seen
it execute up to 50% faster than the latter, by avoiding calling out to the
OS to get pages, purely as a result of less CPU used.

>
>
> However that isn't the exact same thing as a "cache" at least as I was
> trying to describe it. shared buffers are used to keep track of pages
> (as well as some other stuff) and their current status. That is not the
> same as caching a relation.
>
> It is not possible to pin a relation to memory using PostgreSQL.
> PostgreSQL relies on the operating system for that type of caching.
>

The OS can't pin a relation either, from its point of view its all just a
bunch of disk data blocks, not relations -- so it is all roughly
equivalent. The OS can do a bit better job at data prefetch on sequential
scans or other predictable seek sequences (ARC stands for Adaptive
Replacement Cache) than postgres currently does (no intelligent prefetch in
postgres AFAIK).

So I apologize if I made it sound like Postgres cached the actual relation,
its just pages -- but it is basically the same thing as the OS cache, but
kept in process closer to the code that needs it. Its a cache that prevents
disk reads.

My suggestion for the OP is to try it both ways, and see what is better for
his workload / OS / Hardware combination.

>
> Joshua D. Drake
>
>
>
>
>
> --
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Christiaan Willemsen 2008-10-30 21:06:25 Re: Configuring for maximum memory usage
Previous Message Scott Carey 2008-10-30 17:37:23 Re: Configuring for maximum memory usage