Skip site navigation (1) Skip section navigation (2)

Re: PG 8.3 and large shared buffer settings

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Dan Sugalski <dan(at)sidhe(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: PG 8.3 and large shared buffer settings
Date: 2009-09-26 21:19:11
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
On Sat, 26 Sep 2009, Jeff Janes wrote:

> On Sat, Sep 26, 2009 at 8:19 AM, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
>> Another problem spot are checkpoints.  If you dirty a very large buffer
>> cache, that whole thing will have to get dumped to disk eventually, and on
>> some workloads people have found they have to reduce shared_buffers
>> specifically to keep this from being too painful.
> Is this the case even if checkpoint_completion_target is set close to 1.0?

Sure.  checkpoint_completion_target aims to utilize more of the space 
between each checkpoint by spreading them out over more of that space, but 
it alone doesn't change the fact that checkpoints are only so long.  By 
default, you're going to get one every five minutes, and on active systems 
they can come every few seconds if you're not aggressive with increasing 

Some quick math gives an idea of the scale of the problem.  A single cheap 
disk can write random I/O (which checkpoints writes often are) at 1-2MB/s; 
let's call it 100MB/minute.  That means that in 5 minutes, a single disk 
system might be hard pressed to write even 500MB of data out.  But you can 
easily dirty 500MB in seconds nowadays.  Now imagine shared_buffers is 
40GB and you've dirtied a bunch of it; how long will that take to clear 
even on a fast RAID system?  It won't be quick, and the whole system will 
grind to a halt at the end of the checkpoint as all the buffered writes 
queued up are forced out.

> If you dirty buffers fast enough to dirty most of a huge shared_buffers 
> area between checkpoints, then it seems like lowering the shared_buffers 
> wouldn't reduce the amount of I/O needed, it would just shift the I/O 
> from checkpoints to the backends themselves.

What's even worse is that backends can be writing data and filling the OS 
buffer cache in between checkpoints too, but all of that is forced to 
complete before the checkpoint can finish too.  You can easily start the 
checkpoint process with the whole OS cache filled with backend writes that 
will slow checkpoint ones if you're not careful.

Because disks are slow, you need to get things that are written to disk as 
soon as feasible, so the OS has more time to work on them, reorder for 
efficient writing, etc.

Ultimately, the sooner you get I/O to the OS cache to write, the better, 
*unless* you're going to write that same block over again before it must 
go to disk.  Normally you want buffers that aren't accessed often to get 
written out to disk early rather than linger until checkpoint time, 
there's nothing wrong with a backend doing a write if that block wasn't 
going to be used again soon. The ideal setup from a latency perspective is 
that you size shared_buffers just large enough to hold the things you 
write to regularly, but not so big that it caches every write.

> It looks like checkpoint_completion_target was introduced in 8.3.0

Correct.  Before then, you had no hope for reducing checkpoint overhead 
but to use very small settings for shared_buffers, particularly if you 
cranked the old background writer up so that it wrote lots of redundant 
information too (that's was the main result of "tuning" it on versions 
before 8.3 as well).

* Greg Smith gsmith(at)gregsmith(dot)com Baltimore, MD

In response to

pgsql-performance by date

Next:From: Xia QingranDate: 2009-09-27 06:11:18
Subject: Re: Bad performance of SELECT ... where id IN (...)
Previous:From: Jeff JanesDate: 2009-09-26 19:16:52
Subject: Re: PG 8.3 and large shared buffer settings

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group