Skip site navigation (1) Skip section navigation (2)

Re: sorted writes for checkpoints

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sorted writes for checkpoints
Date: 2010-10-29 23:31:50
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Itagaki Takahiro wrote:
> When I submitted the patch, I tested it on disk-based RAID-5 machine:
> But there were no additional benchmarking reports at that time. We still
> need benchmarking before we re-examine the feature. For example, SSD and
> SSD-RAID was not popular at that time, but now they might be considerable.

I did multiple rounds of benchmarking that, just none of it showed any 
improvement so I didn't bother reporting them in detail.  I have 
recently figured out why the performance testing I did of that earlier 
patch probably failed to produce useful results on my system when I was 
testing it back then though.  It relates to trivia around how ext3 
handles fsync that's well understood now (the whole cache flushes out 
when one comes in), but wasn't back then yet.

We have a working set of patches here that both rewrite the checkpoint 
logic to avoid several larger problems with how it works now, as well as 
adding instrumentation that makes it possible to directly measure and 
graph whether methods such as sorting writes provide any improvement or 
not to the process.  My hope is to have those all ready for initial 
submission as part of CommitFest 2010-11, as the main feature addition 
from myself toward improving 9.1.

I have a bunch of background information about this I'm presenting at 
PGWest next week, after which I'll start populating the wiki with more 
details and begin packaging the code too.  I had hoped to revisit the 
checkpoint sorting details after that.  Jeff or yourself are welcome to 
try your own tests in that area, I could use the help.  But I think my 
measurement patches will help you with that considerably once I release 
them in another couple of weeks.  Seeing a graph of latency sync times 
for each file is very informative for figuring out whether a change did 
something useful, more so than just staring at total TPS results.  Such 
latency graphs are what I've recently started to do here, with some 
server-side changes that then feed into gnuplot.

The idea of making something like the sorting logic into a pluggable 
hook seems like a waste of time to me, particulary given that the 
earlier implementation really needed to be allocated a dedicated block 
of shared memory to work well IMHO (and I believe that's still the 
case).  That area isn't where the real problems are at here anyway, 
especially on large memory systems.  How the sync logic works is the 
increasingly troublesome part of the checkpoint code, because the 
problem it has to deal with grows proportionately to the size of the 
write cache on the system.  Typical production servers I deal with have 
about 8X as much RAM now as they did in 2007 when I last investigated 
write sorting.  Regular hard drives sure haven't gotten 8X faster since 
then, and battery-backed caches (which used to have enough memory to 
absorb a large portion of a checkpoint burst) have at best doubled in size.

Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support

In response to

pgsql-hackers by date

Next:From: Marti RaudseppDate: 2010-10-29 23:33:18
Subject: [PATCH] More Coccinelli cleanups
Previous:From: Alvaro HerreraDate: 2010-10-29 20:28:11
Subject: Re: crash in plancache with subtransactions

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group