Lawrence Cohan <LCohan(at)web(dot)com> wrote:
> We managed to put together a new test server running PG 9.0.2 on
> 2socketsx6cores = 12CPU with 64 GB RAM against a 3PAR 10TB SAN. We
> kept the settings I submitted already (and enclosed below) and
> after 12 hours of pounding the box with PGBENCH running 8 scripts
> to perform all of INSERT/UPDATE/DELETE/SELECT statements we wanted
> we got a pretty good picture of what can do with those settings.
> We got a load average of 60 with CPU up and around that 60% mark,
> pushing through about 1400 transactions per second for 12 hours.
> We made the changes as suggested and listed below but the
> throughput dropped from 1400 t/s to 400 t/s and I suspect are the
> "stalled" transactions you mentioned about.
>
> Here's what we changed:
>
> Current Settings Test Settings
> ================================================================
> shared_buffers = 500MB shared_buffers = 8GB
> effective_cache_size = 2GB effective_cache_size = 32GB
To make sure I understand, are the "Current Settings" the ones which
performed better?
> Just to be 100% accurate we ask you what do you mean by:
>
> 1) "Make the background writer more aggressive and/or back
> off on shared_memory, so that there isn't such a glut of dirty
> pages to Write during a checkpoint."
>
> By aggressive does he mean changing any of the following?
> # - Background Writer -
>
> #bgwriter_delay = 200ms
> #bgwriter_lru_maxpages = 100
> #bgwriter_lru_multiplier = 2.0
We use these overrides:
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 4.0
> Or we should be better of by the checkpoint segment handling - any
> of the below:
> # - Checkpoints -
>
> checkpoint_segments = 64
> #checkpoint_timeout = 5min
> #checkpoint_completion_target = 0.5
> checkpoint_warning = 60s
You might consider increasing checkpoint_timeout if you can tolerate
the increased recovery time if there is a crash. You should
probably boost checkpoint_completion_target to 0.8 or 0.9.
Really, if you don't already have it, you should get a copy of Greg
Smith's recent book on performance:
http://www.postgresql.org/docs/books/
It gives good descriptions of all of these parameters and advice on
incremental tuning to find you best settings.
The fact that you listed shared_buffers and effective_cache_size
together at least suggests that you don't yet grasp the role of
these settings. One affects how much memory PostgreSQL allocates;
the other has absolutely nothing to do with that.
effective_cache_size affects costs assigned to various plans,
thereby affecting plan choice. While a high shared_buffers setting
might lead to a glut of writes around commit time, setting
effective_cache_size incorrectly might lead to plans which don't
read the data efficiently. Seeing what vmstat or iostat say during
a slow episode, and seeing whether the episodes correspond to
checkpoints, will give you a better indication of where the problem
lies.
-Kevin