Skip site navigation (1) Skip section navigation (2)

Re: Spread checkpoint sync

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To:
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spread checkpoint sync
Date: 2011-02-01 15:49:03
Message-ID: 4D482B6F.9000302@2ndquadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Greg Smith wrote:
> I think the right way to compute "relations to sync" is to finish the 
> sorted writes patch I sent over a not quite right yet update to already

Attached update now makes much more sense than the misguided patch I 
submitted two weesk ago.  This takes the original sorted write code, 
first adjusting it so it only allocates the memory its tag structure is 
stored in once (in a kind of lazy way I can improve on right now).  It 
then computes a bunch of derived statistics from a single walk of the 
sorted data on each pass through.  Here's an example of what comes out:

DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11809.0_0
DEBUG:  BufferSync 2 dirty blocks in relation.segment_fork 11811.0_0
DEBUG:  BufferSync 3 dirty blocks in relation.segment_fork 11812.0_0
DEBUG:  BufferSync 3 dirty blocks in relation.segment_fork 16496.0_0
DEBUG:  BufferSync 28 dirty blocks in relation.segment_fork 16499.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11638.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11640.0_0
DEBUG:  BufferSync 2 dirty blocks in relation.segment_fork 11641.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11642.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11644.0_0
DEBUG:  BufferSync 2048 dirty blocks in relation.segment_fork 16508.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11645.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11661.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11663.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11664.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11672.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11685.0_0
DEBUG:  BufferSync 2097 buffers to write, 17 total dirty segment file(s) 
expected to need sync

This is the first checkpoint after starting to populate a new pgbench 
database.  The next four show it extending into new segments:

DEBUG:  BufferSync 2048 dirty blocks in relation.segment_fork 16508.1_0
DEBUG:  BufferSync 2048 buffers to write, 1 total dirty segment file(s) 
expected to need sync

DEBUG:  BufferSync 2048 dirty blocks in relation.segment_fork 16508.2_0
DEBUG:  BufferSync 2048 buffers to write, 1 total dirty segment file(s) 
expected to need sync

DEBUG:  BufferSync 2048 dirty blocks in relation.segment_fork 16508.3_0
DEBUG:  BufferSync 2048 buffers to write, 1 total dirty segment file(s) 
expected to need sync

DEBUG:  BufferSync 2048 dirty blocks in relation.segment_fork 16508.4_0
DEBUG:  BufferSync 2048 buffers to write, 1 total dirty segment file(s) 
expected to need sync

The fact that it's always showing 2048 dirty blocks on these makes me 
think I'm computing something wrong still, but the general idea here is 
working now.  I had to use some magic from the md layer to let bufmgr.c 
know how its writes were going to get mapped into file segments and 
correspondingly fsync calls later.  Not happy about breaking the API 
encapsulation there, but don't see an easy way to compute that data at 
the per-segment level--and it's not like that's going to change in the 
near future anyway.

I like this approach for a providing a map of how to spread syncs out 
for a couple of reasons:

-It computes data that could be used to drive sync spread timing in a 
relatively short amount of simple code.

-You get write sorting at the database level helping out the OS.  
Everything I've been seeing recently on benchmarks says Linux at least 
needs all the help it can get in that regard, even if block order 
doesn't necessarily align perfectly with disk order.

-It's obvious how to take this same data and build a future model where 
the time allocated for fsyncs was proportional to how much that 
particular relation was touched.

Benchmarks of just the impact of the sorting step and continued bug 
swatting to follow.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Attachment: new-sorted-writes-v3.patch
Description: text/x-patch (7.8 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2011-02-01 16:43:29
Subject: Re: Error code for "terminating connection due to conflict with recovery"
Previous:From: Robert HaasDate: 2011-02-01 15:14:49
Subject: Re: [pgsql-general 2011-1-21:] Are there any projects interested in object functionality? (+ rule bases)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group