Skip site navigation (1) Skip section navigation (2)

Re: choosing the right platform

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Matthew Nuzum <cobalt(at)bearfruit(dot)org>,"'Josh Berkus'" <josh(at)agliodbs(dot)com>,"'Pgsql-Performance'" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: choosing the right platform
Date: 2003-04-10 16:51:38
Message-ID: Pine.LNX.4.33.0304101042480.23398-100000@css120.ihs.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Wed, 9 Apr 2003, Jim C. Nasby wrote:

> On Wed, Apr 09, 2003 at 11:55:56AM -0600, scott.marlowe wrote:
> > A couple more suggestions.  One is to never allocate more than 50% of your 
> > memory to a database's shared buffers, i.e. let the OS buffer the disks en 
> > masse, while the database should have a smaller buffer for the most recent 
> > accesses.  This is because kernel caching is usually faster and more 
> > efficient than the database doing it, and this becomes more an issue with 
> > large chunks of memory, which both Linux and BSD are quite good at 
> > caching, and postgresql, not so good.
>  
> That seems odd... shouldn't pgsql be able to cache information better
> since it would be cached in whatever format is best for it, rather than
> the raw page format (or maybe that is the best format).

Yes and no.  The problem isn't that the data is closer to postgresql in 
it's buffers versus further away in kernel buffers, it's that postgresql's 
caching algorhythm isn't performance tweaked for very large settings, it's 
performance tweaked to provide good performance on smaller machines, with 
say 4 or 16 Megs of shared buffers.  Handling large buffers requires a 
different approach to handling small ones, and the kernel is optimized in 
that direction.

Also, the kernel in most Oses, i.e. Linux and BSD tends to use "spare ram" 
with abandon as cache memory, so if you've got 4 gigs of ram, with 200 
Megs set aside for postgresql, it's quite likely that the kernel cache can 
hold ALL your dataset for you once it's been read in once.  So, the data 
is already cached once.  Caching it again in Postgresql only gains a 
little, since the speed difference of postgresql shared buffer / cache and 
kernel caches is very small.  However, the speed going to the hard drive 
is much slower.

What you don't want is a postgresql cache that's bigger (on average) than 
the kernel cache, since the kernel cache will then be "thrashing" when you 
access information not currently in either cache.  I.e. postgresql becomes 
your only cache, and kernel caching stops working for you and becomes just 
overhead, since you never get anything from it if it's too small to cache 
something long enough to be used again.

> There's also the
> issue of having to go through more layers of software if you're relying
> on the OS caching. All the tuning info I've seen for every other
> database I've worked with specifically recommends giving the database as
> much memory as you possibly can, the theory being that it will do a much
> better job of caching than the OS will.

That's old school thinking.  There was a day when kernel caching was much 
slower, and writing directly to your devices in a raw mode was the only 
way to ensure good performance.  Nowadays, most modern Unix kernels and 
their file systems are a match for most database needs.  heck, with some 
storage systems, the performance of the file system is just not really an 
issue, it's the bandwidth of the connector you use.  

Note that this is a good thing (TM) since it frees the postgresql 
development team to do other things than worry about caching 1 gig of 
data.


In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2003-04-10 17:17:39
Subject: Re: choosing the right platform
Previous:From: scott.marloweDate: 2003-04-10 16:42:35
Subject: Re: choosing the right platform

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group