Skip site navigation (1) Skip section navigation (2)

Checkpoint tuning on 8.2.4

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: wsmith23_2001(at)yahoo(dot)com
Subject: Checkpoint tuning on 8.2.4
Date: 2008-06-06 06:30:23
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
One of the things to hit my mailbox this week is from someone who is 
frustrated not only by their database server but by issues sending 
messages to this list; I'm forwarding to here for them, please reply to 
all so they get a copy.

Here's the basic server information:

> PostgreSQL version - 8.2.4, RHEL4 Linux
> 64-bit, 8 cpu(s), 16GB memory, raid 5 storage.
> The tuning objective is to optimize the PostgreSQL database to handle 
> both reads and writes.  The database receives continuous inserts, 
> updates and deletes on tables with 140+ million records.

The primary problem they're having are really awful checkpoint spikes, 
which is how I got conne^H^Hvinced into helping out here.  I belive this 
is hardware RAID with a caching controller.

First off, the bad news nobody ever wants to hear:  you can't really make 
this problem go completely away in many situations with 8.2, whereas the 
new spread checkpoint feature in 8.3 is aimed specifically at this 
problem. goes 
over all that, along with introducing some of the ideas I'll toss in below 
about how to optimize for 8.2.  A related paper I did talks about reducing 
how much memory Linux caches for you when writing heavily, that might be 
appropriate here as well:

The other obvious suggestion is that RAID5 is known to be poor at heavy 
write performance, which makes it really the wrong choice here as well. 
What this system really wants to have done to it is to be reconfigured 
with RAID10 and PostgreSQL 8.3 instead.  But since as always that's 
impractical for now, let's take a look at the postgresql.settings to see 
what might be improved immediately:

max_connections = 128
shared_buffers = 400
temp_buffers = 1000
effective_cache_size = 50000
random_page_cost = 2.5

They've experimented with lowering shared_buffers so much here because it 
helps the problem, but 400 is going a bit too far.  You should be able to 
get at to least a few thousand for that setting without making the problem 
much worse, and that will help lower general I/O that might block the 
checkpoint work a bit.  Something like 5000 to 20000 would be my guess for 
a good setting here.

sort_mem = 4194304
vacuum_mem = 2097152
work_mem = 4194304
maintenance_work_mem = 256000

There is no sort_mem or vacuum_mem in 8.2 anymore, so those can be 
deleted:  replaced by work_mem and maintenance_work_mem.  The values for 
all the active *_mem settings here are on the low side for a system with 
16GB of RAM.  If we re-cast these with more useful units this is obvious:

work_mem = 4MB
maintenance_work_mem = 256KB

Try work_mem=16MB and maintenance_work_mem=256MB instead as starting 
values.  work_mem could go a lot higher, but you have to have to be 
careful to consider how many connections are involved because this is a 
per-session parameter.

effective_cache_size is wildly low here; something >8GB is likely more 
accurate.  While not directly causing checkpoint issues, getting better 
plans can lower overall system I/O through more efficient use of available 
resources and therefore leave more bandwidth for the writes.

bgwriter_lru_percent = 70
bgwriter_lru_maxpages = 800
bgwriter_all_percent = 50
bgwriter_all_maxpages = 800

Ah, the delicate scent of someone on IRC suggesting "oh, checkpoints 
spikes are taken care of by the background writer, just make that more 
aggressive and they'll go away".  These values are crazy big, and the only 
reason they work at all is that with shared_buffers=400 and 8 CPUs you can 
afford to scan them every single time and nobody cares.  The settings 
Kevin Grittner settled on that I mentioned in the 8.2->8.3 paper are about 
as aggressive as I've ever seen work well in the real world:

> bgwriter_delay = 200
> bgwriter_lru_percent = 20.0
> bgwriter_lru_maxpages = 200
> bgwriter_all_percent = 10.0
> bgwriter_all_maxpages = 600

I personally will often just turn the background writer off all together 
by setting both maxpages parameters to zero, and wait for the surprised 
looks as the checkpoint spikes get smaller.  The 8.2 BGW just isn't 
effective in modern systems with gigabytes of RAM.  It writes the same 
blocks over and over into the gigantic OS cache, in a way that competes 
inefficiently for I/O resources with how buffers are naturally evicted 
anyway when you use the kind of low shared_buffers settings that are a 
must on 8.2.

fsync = off

Well, this is asking for trouble.  The first time your server crashes, I 
hope you're feeling lucky.  I think this system is setup so that it can 
easily be replaced if there's a problem, so this may not be a huge 
problem, but it is dangerous to turn fsync off.

checkpoint_segments = 40
checkpoint_timeout = 300
checkpoint_warning = 15

setting checkpoint_segments to 40 is likely too large for an 8.2 system 
that's writing heavily.  That keeps the number of checkpoints down, so you 
get less spikes, but each one of them will be much larger.  Something in 
the 5-20 range is likely more appropriate here.

vacuum_cost_delay = 750
autovacuum = true
autovacuum_naptime = 3600
autovacuum_vacuum_threshold = 1000
autovacuum_analyze_threshold = 500
autovacuum_vacuum_scale_factor = 0.4
autovacuum_analyze_scale_factor = 0.2
autovacuum_vacuum_cost_delay = -1
autovacuum_vacuum_cost_limit = -1
max_fsm_pages = 5000000
max_fsm_relations = 2000

Now, when I was on the phone about this system, I recall hearing that 
they've fallen into that ugly trap where they are forced to reload this 
database altogether regularly to get performance to stay at a reasonable 
level.  That's usually a vacuum problem, and yet another reason to upgrade 
to 8.3 so you get the improved autovacuum there.  Vacuum tuning isn't 
really my bag, and I'm out of time here tonight; anybody else want to make 
some suggestions on what might be changed here based on what I've shared 
about the system?

* Greg Smith gsmith(at)gregsmith(dot)com Baltimore, MD


pgsql-performance by date

Next:From: Volkan YAZICIDate: 2008-06-06 13:20:59
Subject: Optimizing AGE()
Previous:From: Dan HarrisDate: 2008-06-05 20:22:49
Subject: Re: query performance question

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group