Re: Volunteer to build a configuration tool

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: "Campbell, Lance" <lance(at)uiuc(dot)edu>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Volunteer to build a configuration tool
Date: 2007-06-20 16:59:38
Message-ID: 46795CFA.7070209@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-performance

Campbell, Lance wrote:
> It is amazing how many times you can read something before it actually
> sinks in.
>
> There seems to be two possible approaches to optimizing PostgreSQL 8.2:

Right.

> File caching approach:
> This approach is based on the fact that the OS will cache the necessary
> PostgreSQL files. The key here is to set the size of
> effective_cache_size value as high as you think the OS has memory to
> cache the files. This approach would need the value of shared_buffers
> to be relatively low. Otherwise you are in a cense storing the data
> twice. One would also have to make sure that work_mem is not too high.
> Since the files would be cached by the OS, work_mem could be relatively
> low. This is an ideal approach if you have a dedicated server since
> there would be no other software using memory or accessing files that
> the OS would try to cache.

There's no particular danger in setting work_mem too high in this
approach. In fact, it's more important avoid a too large worm_mem
setting with the other approach, because if you set it too high you can
force the system to swap, while with the "file caching approach" the OS
will just evict some of the cached pages to make room for sorts etc.

> Memory driven approach:
> In this approach you want to create a large value for shared_buffers.
> You are relying on shared_buffers to hold the most commonly accessed
> disk blocks. The value for effective_cache_size would be relatively
> small since you are not relying on the OS to cache files.

effective_cache_size should be set to the estimated amount of memory
available for caching, *including* shared_buffers. So it should be set
to a similar value in both approaches.

> This seems
> like it would be the ideal situation if you have other applications
> running on the box.

Actually it's the opposite. If there's other applications competing for
the memory, it's better to let the OS manage the cache because it can
make decisions on which pages to keep in cache and which to evict across
all applications.

> By setting shared_buffers to a high value you are
> guaranteeing memory available to PostgreSQL (this assumes the other
> applications did not suck up to much memory to make your OS use virtual
> memory).

You're guaranteeing memory available to PostgreSQL, at the cost of said
memory being unavailable from other applications. Or as you point out,
in the worst case you end up swapping.

> Do I understand the possible optimization paths correctly? The only
> question I have about this approach is: if I use the "memory driven
> approach" since effective_cache_size would be small I would assume I
> would need to fiddle with random_page_cost since there would be know way
> for PostgreSQL to know I have a well configured system.

I don't see how effective_cache_size or the other settings affect
random_page_cost. random_page_cost should mostly depend on your I/O
hardware, though I think it's common practice to lower it when your
database is small enough to fit mostly or completely in cache on the
grounds that random access in memory is almost as fast as sequential access.

> If everything I said is correct then I agree "Why have
> effective_cache_size?" Why not just go down the approach that Oracle
> has taken and require people to rely more on shared_buffers and the
> general memory driven approach? Why rely on the disk caching of the OS?
> Memory is only getting cheaper.

That has been discussed before many times, search the archives on direct
I/O for previous flamewars on that subject. In a nutshell, we rely on
the OS to not only do caching for us, but I/O scheduling and readahead
as well. That saves us a lot of code, and the OS is in a better position
to do that as well, because it knows the I/O hardware and disk layout so
that it can issue the I/O requests in the most efficient way.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Mark Lewis 2007-06-20 17:27:17 Re: Volunteer to build a configuration tool
Previous Message Campbell, Lance 2007-06-20 16:40:32 Re: Volunteer to build a configuration tool

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Lewis 2007-06-20 17:27:17 Re: Volunteer to build a configuration tool
Previous Message Michael Glaesemann 2007-06-20 16:53:44 Re: Slow indexscan