Skip site navigation (1) Skip section navigation (2)

Checkpoint sync pause

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Checkpoint sync pause
Date: 2012-01-16 07:57:10
Message-ID: 4F13D856.60704@2ndQuadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Last year at this point, I submitted an increasingly complicated 
checkpoint sync spreading feature.  I wasn't able to prove any 
repeatable drop in sync time latency from those patches.  While that was 
going on, and continuing into recently, the production server that 
started all this with its sync time latency issues didn't stop having 
that problem.  Data collection continued, new patches were tried.

There was a really simple triage step Simon and I made before getting 
into the complicated ones:  just delay for a few seconds between every 
single sync call made during a checkpoint.  That approach is still 
hanging around that server's patched PostgreSQL package set, and it 
still works better than anything more complicated we've tried so far.  
The recent split of background writer and checkpointer makes that whole 
thing even easier to do without rippling out to have unexpected 
consequences.

In order to be able to tune this usefully, you need to know information 
about how many files a typical checkpoint syncs.  That could be 
available without needing log scraping using the "Publish checkpoint 
timing and sync files summary data to pg_stat_bgwriter" addition I just 
submitted.  People who set this new checkpoint_sync_pause value too high 
can face checkpoints running over schedule, but you can measure how bad 
your exposure is with the new view information.

I owe the community a lot of data to prove this is useful before I'd 
expect it to be taken seriously.  I was planning to leave this whole 
area alone until 9.3.  But since recent submissions may pull me back 
into trying various ways of rearranging the write path for 9.2, I wanted 
to have my own miniature horse in that race.  It works simply:

...
2012-01-16 02:39:01.184 EST [25052]: DEBUG:  checkpoint sync: number=34 
file=base/16385/11766 time=0.006 msec
2012-01-16 02:39:01.184 EST [25052]: DEBUG:  checkpoint sync delay: 
seconds left=3
2012-01-16 02:39:01.284 EST [25052]: DEBUG:  checkpoint sync delay: 
seconds left=2
2012-01-16 02:39:01.385 EST [25052]: DEBUG:  checkpoint sync delay: 
seconds left=1
2012-01-16 02:39:01.860 EST [25052]: DEBUG:  checkpoint sync: number=35 
file=global/12007 time=375.710 msec
2012-01-16 02:39:01.860 EST [25052]: DEBUG:  checkpoint sync delay: 
seconds left=3
2012-01-16 02:39:01.961 EST [25052]: DEBUG:  checkpoint sync delay: 
seconds left=2
2012-01-16 02:39:02.061 EST [25052]: DEBUG:  checkpoint sync delay: 
seconds left=1
2012-01-16 02:39:02.161 EST [25052]: DEBUG:  checkpoint sync: number=36 
file=base/16385/11754 time=0.008 msec
2012-01-16 02:39:02.555 EST [25052]: LOG:  checkpoint complete: wrote 
2586 buffers (63.1%); 1 transaction log file(s) added, 0 removed, 0 
recycled; write=2.422 s, sync=13.282 s, total=16.123 s; sync files=36, 
longest=1.085 s, average=0.040 s

No docs yet, really need a better guide to tuning checkpoints as they 
exist now before there's a place to attach a discussion of this to.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


Attachment: checkpoint-sync-pause-v1.patch
Description: text/x-patch (5.7 KB)

Responses

pgsql-hackers by date

Next:From: Ilya KosmodemianskyDate: 2012-01-16 08:06:21
Subject: Re: SKIP LOCKED DATA
Previous:From: Greg SmithDate: 2012-01-16 06:46:02
Subject: Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group