Skip site navigation (1) Skip section navigation (2)

Re: Spread checkpoint sync

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spread checkpoint sync
Date: 2010-11-30 20:29:57
Message-ID: 4CF55EC5.5000108@2ndquadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Ron Mayer wrote:
> Might smoother checkpoints be better solved by talking
> to the OS vendors & virtual-memory-tunning-knob-authors
> to work with them on exposing the ideal knobs; rather than
> saying that our only tool is a hammer(fsync) so the problem
> must be handled as a nail.
>   

Maybe, but it's hard to argue that the current implementation--just 
doing all of the sync calls as fast as possible, one after the other--is 
going to produce worst-case behavior in a lot of situations.  Given that 
it's not a huge amount of code to do better, I'd rather do some work in 
that direction, instead of presuming the kernel authors will ever make 
this go away.  Spreading the writes out as part of the checkpoint rework 
in 8.3 worked better than any kernel changes I've tested since then, and 
I'm not real optimisic about this getting resolved at the system level.  
So long as the database changes aren't antagonistic toward kernel 
improvements, I'd prefer to have some options here that become effective 
as soon as the database code is done.

I've attached an updated version of the initial sync spreading patch 
here, one that applies cleanly on top of HEAD and over top of the sync 
instrumentation patch too.  The conflict that made that hard before is 
gone now.

Having the pg_stat_bgwriter.buffers_backend_fsync patch available all 
the time now has made me reconsider how important one potential bit of 
refactoring here would be.  I managed to catch one of the situations 
where really popular relations were being heavily updated in a way that 
was competing with the checkpoint on my test system (which I can happily 
share the logs of), with the instrumentation patch applied but not the 
spread sync one:

LOG:  checkpoint starting: xlog
DEBUG:  could not forward fsync request because request queue is full
CONTEXT:  writing block 7747 of relation base/16424/16442
DEBUG:  could not forward fsync request because request queue is full
CONTEXT:  writing block 42688 of relation base/16424/16437
DEBUG:  could not forward fsync request because request queue is full
CONTEXT:  writing block 9723 of relation base/16424/16442
DEBUG:  could not forward fsync request because request queue is full
CONTEXT:  writing block 58117 of relation base/16424/16437
DEBUG:  could not forward fsync request because request queue is full
CONTEXT:  writing block 165128 of relation base/16424/16437
[330 of these total, all referring to the same two relations]

DEBUG:  checkpoint sync: number=1 file=base/16424/16448_fsm 
time=10132.830000 msec
DEBUG:  checkpoint sync: number=2 file=base/16424/11645 time=0.001000 msec
DEBUG:  checkpoint sync: number=3 file=base/16424/16437 time=7.796000 msec
DEBUG:  checkpoint sync: number=4 file=base/16424/16448 time=4.679000 msec
DEBUG:  checkpoint sync: number=5 file=base/16424/11607 time=0.001000 msec
DEBUG:  checkpoint sync: number=6 file=base/16424/16437.1 time=3.101000 msec
DEBUG:  checkpoint sync: number=7 file=base/16424/16442 time=4.172000 msec
DEBUG:  checkpoint sync: number=8 file=base/16424/16428_vm time=0.001000 
msec
DEBUG:  checkpoint sync: number=9 file=base/16424/16437_fsm 
time=0.001000 msec
DEBUG:  checkpoint sync: number=10 file=base/16424/16428 time=0.001000 msec
DEBUG:  checkpoint sync: number=11 file=base/16424/16425 time=0.000000 msec
DEBUG:  checkpoint sync: number=12 file=base/16424/16437_vm 
time=0.001000 msec
DEBUG:  checkpoint sync: number=13 file=base/16424/16425_vm 
time=0.001000 msec
LOG:  checkpoint complete: wrote 3032 buffers (74.0%); 0 transaction log 
file(s) added, 0 removed, 0 recycled; write=1.742 s, sync=10.153 s, 
total=37.654 s; sync files=13, longest=10.132 s, average=0.779 s

Note here how the checkpoint was hung on trying to get 16448_fsm written 
out, but the backends were issuing constant competing fsync calls to 
these other relations.  This is very similar to the production case this 
patch was written to address, which I hadn't been able to share a good 
example of yet.  That's essentially what it looks like, except with the 
contention going on for minutes instead of seconds.

One of the ideas Simon and I had been considering at one point was 
adding some better de-duplication logic to the fsync absorb code, which 
I'm reminded by the pattern here might be helpful independently of other 
improvements.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Attachment: sync-spread-v3.patch
Description: text/x-patch (7.4 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Andrew DunstanDate: 2010-11-30 20:34:42
Subject: Re: DELETE with LIMIT (or my first hack)
Previous:From: Alastair TurnerDate: 2010-11-30 20:26:11
Subject: Re: DELETE with LIMIT (or my first hack)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group