Quick Links

Re: Spread checkpoint sync

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Spread checkpoint sync
Date:	2011-02-01 15:49:03
Message-ID:	4D482B6F.9000302@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greg Smith wrote:
> I think the right way to compute "relations to sync" is to finish the
> sorted writes patch I sent over a not quite right yet update to already

Attached update now makes much more sense than the misguided patch I
submitted two weesk ago. This takes the original sorted write code,
first adjusting it so it only allocates the memory its tag structure is
stored in once (in a kind of lazy way I can improve on right now). It
then computes a bunch of derived statistics from a single walk of the
sorted data on each pass through. Here's an example of what comes out:

DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11809.0_0
DEBUG: BufferSync 2 dirty blocks in relation.segment_fork 11811.0_0
DEBUG: BufferSync 3 dirty blocks in relation.segment_fork 11812.0_0
DEBUG: BufferSync 3 dirty blocks in relation.segment_fork 16496.0_0
DEBUG: BufferSync 28 dirty blocks in relation.segment_fork 16499.0_0
DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11638.0_0
DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11640.0_0
DEBUG: BufferSync 2 dirty blocks in relation.segment_fork 11641.0_0
DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11642.0_0
DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11644.0_0
DEBUG: BufferSync 2048 dirty blocks in relation.segment_fork 16508.0_0
DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11645.0_0
DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11661.0_0
DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11663.0_0
DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11664.0_0
DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11672.0_0
DEBUG: BufferSync 1 dirty blocks in relation.segment_fork 11685.0_0
DEBUG: BufferSync 2097 buffers to write, 17 total dirty segment file(s)
expected to need sync

This is the first checkpoint after starting to populate a new pgbench
database. The next four show it extending into new segments:

DEBUG: BufferSync 2048 dirty blocks in relation.segment_fork 16508.1_0
DEBUG: BufferSync 2048 buffers to write, 1 total dirty segment file(s)
expected to need sync

DEBUG: BufferSync 2048 dirty blocks in relation.segment_fork 16508.2_0
DEBUG: BufferSync 2048 buffers to write, 1 total dirty segment file(s)
expected to need sync

DEBUG: BufferSync 2048 dirty blocks in relation.segment_fork 16508.3_0
DEBUG: BufferSync 2048 buffers to write, 1 total dirty segment file(s)
expected to need sync

DEBUG: BufferSync 2048 dirty blocks in relation.segment_fork 16508.4_0
DEBUG: BufferSync 2048 buffers to write, 1 total dirty segment file(s)
expected to need sync

The fact that it's always showing 2048 dirty blocks on these makes me
think I'm computing something wrong still, but the general idea here is
working now. I had to use some magic from the md layer to let bufmgr.c
know how its writes were going to get mapped into file segments and
correspondingly fsync calls later. Not happy about breaking the API
encapsulation there, but don't see an easy way to compute that data at
the per-segment level--and it's not like that's going to change in the
near future anyway.

I like this approach for a providing a map of how to spread syncs out
for a couple of reasons:

-It computes data that could be used to drive sync spread timing in a
relatively short amount of simple code.

-You get write sorting at the database level helping out the OS.
Everything I've been seeing recently on benchmarks says Linux at least
needs all the help it can get in that regard, even if block order
doesn't necessarily align perfectly with disk order.

-It's obvious how to take this same data and build a future model where
the time allocated for fsyncs was proportional to how much that
particular relation was touched.

Benchmarks of just the impact of the sorting step and continued bug
swatting to follow.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Attachment	Content-Type	Size
new-sorted-writes-v3.patch	text/x-patch	7.8 KB

In response to

Re: Spread checkpoint sync at 2011-01-31 21:33:18 from Greg Smith

Responses

Re: Spread checkpoint sync at 2011-02-01 18:30:53 from Bruce Momjian
Re: Spread checkpoint sync at 2011-02-04 19:08:07 from Greg Smith

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2011-02-01 16:43:29	Re: Error code for "terminating connection due to conflict with recovery"
Previous Message	Robert Haas	2011-02-01 15:14:49	Re: [pgsql-general 2011-1-21:] Are there any projects interested in object functionality? (+ rule bases)