Re: Slow Query / Check Point Segments

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Alex - <aintokyo(at)hotmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Slow Query / Check Point Segments
Date: 2010-01-22 06:21:59
Message-ID: 4B594407.4050508@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Alex - wrote:
> checkpoint_segments = 32 # in logfile segments, min 1,
> 16MB each
> checkpoint_timeout = 30min # range 30s-1h

These parameters are not so interesting on their own. The important
thing to check is how often checkpoints are happening, and how much work
each of them do. Here's a useful section from your logs to comment on:

> 2010-01-22 12:21:48 JSTLOG: checkpoint complete: wrote 83874 buffers
> (16.0%); 0 transaction log file(s) added, 0 removed, 32 recycled;
> write=138.040 s, sync=0.000 s, total=138.063 s
> 2010-01-22 12:23:32 JSTLOG: checkpoint complete: wrote 82856 buffers
> (15.8%); 0 transaction log file(s) added, 0 removed, 32 recycled;
> write=18.740 s, sync=0.000 s, total=18.783 s
> 2010-01-22 12:24:26 JSTLOG: checkpoint complete: wrote 75145 buffers
> (14.3%); 0 transaction log file(s) added, 0 removed, 32 recycled;
> write=12.129 s, sync=0.000 s, total=12.132 s
> 2010-01-22 12:25:30 JSTLOG: checkpoint complete: wrote 82108 buffers
> (15.7%); 0 transaction log file(s) added, 0 removed, 32 recycled;
> write=10.619 s, sync=0.000 s, total=10.621 s
> 2010-01-22 12:28:03 JSTLOG: checkpoint complete: wrote 87349 buffers
> (16.7%); 0 transaction log file(s) added, 0 removed, 32 recycled;
> write=82.190 s, sync=0.000 s, total=82.192 s

Here you're getting a checkpoint every minute or three, and each of them
is writing out ~80000 buffers = 625MB. That is crazy, and no wonder
your queries are slow--the system is spending all of its time doing
constant, extremely expensive checkpoints.

You should re-tune this system until the checkpoints show up no more
often than every 5 minutes, and preferably closer to 10. In your case,
that means greatly increasing checkpoint_segments.. And you might as
well decrease checkpoint_timeout, because right now the timeout is
rarely ever being reached before the system runs out of working segments
and executes a checkpoint for that reason.

I would suggest changing the parameters to these:

checkpoint_segments = 96 # in logfile segments, min 1,
16MB each
checkpoint_timeout = 10min # range 30s-1h

And seeing how long the interval between checkpoints becomes under load
afterwards. That should make it in the just >5 minute range. I'd
expect that the number of buffers will only increase a bit, so instead
of dumping out 625MB every minute or three you should see maybe 800MB
every 5 minutes--big improvement.

Also: just after making the change, save a copy of:

select now(),* from pg_stat_bgwriter

And then run that same query again a few hours after the change (and
maybe the next day twoo). Subtract the two values to see how much they
changed, and then you'll find some useful info to compute the checkpoint
interval without even looking at the logs. There's a lot of theory and
comments about this area at
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

> 2010-01-22 12:36:08 JSTLOG: checkpoint complete: wrote 80819 buffers
> (15.4%); 0 transaction log file(s) added, 0 removed, 32 recycled;
> write=51.476 s, sync=0.000 s, total=51.478 s
> 2010-01-22 13:01:54 JSTLOG: checkpoint complete: wrote 4892 buffers
> (0.9%); 0 transaction log file(s) added, 0 removed, 32 recycled;
> write=494.868 s, sync=0.000 s, total=494.982 s
>

See that last one? That's a checkpoint that's being driven by time
expiring (30 minutes) rather than running out of segments. That just
suggests that the system was mostly idle during that period. Long write
times are perfectly fine here, the checkpoints are spread out over a
long time if possible in order to reduce average I/O. You're just not
seeing that the rest of the time because checkpoints are happening so often.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Yan Cheng Cheok 2010-01-22 06:59:42 Optimization on JOIN
Previous Message Yan Cheng Cheok 2010-01-22 06:16:06 Re: Extremely Slow Cascade Delete Operation