Skip site navigation (1) Skip section navigation (2)

Re: Incremental checkopints

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Incremental checkopints
Date: 2011-07-29 19:03:06
Message-ID: 4E3303EA.6000602@2ndQuadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 07/29/2011 11:04 AM, jordani(at)go-link(dot)net wrote:
> I think that current implementation of checkpoints is not good for huge
> shared buffer cache and for many WAL segments. If there is more buffers
> and if buffers can be written rarely more updates of buffers can be
> combined so total number of writes to disk will be significantly less. I
> think that incremental checkpoints can achieve this goal (maybe more) and
> price is additional memory (about 1/1000 of size of buffer cache).
>    

The current code optimizes for buffers that are written frequently.  
Those will sit in shared_buffers and in the hoped for case, only be 
written once at checkpoint time.

There are two issues with adopting increment checkpoints instead, one 
fundamental, the other solvable but not started on yet:

1) Postponing writes as long as possible always improves the resulting 
throughput of those writes.  Any incremental checkpoint approach will 
detune throughput by some amount.  If you make writes go out more often, 
they will be less efficient; that's just how things work if you 
benchmark anything that allows write combining.  Any incremental 
checkpoint approach is likely to improve latency in some cases if it 
works well, while decreasing throughput in most cases.

2) The incremental checkpoint approach used by other databases, such as 
the MySQL implementation, works by tracking what transaction IDs were 
associated with a buffer update.  The current way PostgreSQL saves 
buffer sync information for the checkpoint to process things doesn't 
store enough information to do that.  As you say, the main price there 
is some additional memory.

 From my perspective, the main problem with plans to tweak the 
checkpoint code is that we don't have a really good benchmark that 
tracks both throughput and latency to test proposed changes against.  
Mark Wong has been working to get his TCP-E clone DBT-5 running 
regularly for that purpose, and last I heard that was basically done at 
this point--he's running daily tests now.  There's already a small pile 
of patches that adjust checkpoint behavior around that were postponed 
from being included in 9.1 mainly because it was hard to prove they were 
useful given the benchmark used to test them, pgbench.  I have higher 
hopes for DBT-5 as being a test that gives informative data in this 
area.  I would want to go back and revisit the existing patches (sorted 
checkpoints, spread sync) before launching into this whole new area.  I 
don't think any of those has even been proven not to work, they just 
didn't help the slightly unrealistic pgbench write-heavy workload.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


In response to

Responses

pgsql-hackers by date

Next:From: Alvaro HerreraDate: 2011-07-29 19:26:37
Subject: Re: SSI error messages
Previous:From: Robert HaasDate: 2011-07-29 18:59:09
Subject: Re: include host names in hba error messages

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group