Re: sorted writes for checkpoints

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sorted writes for checkpoints
Date: 2010-10-29 23:31:50
Message-ID: 4CCB5966.7010002@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Itagaki Takahiro wrote:
> When I submitted the patch, I tested it on disk-based RAID-5 machine:
> http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php
> But there were no additional benchmarking reports at that time. We still
> need benchmarking before we re-examine the feature. For example, SSD and
> SSD-RAID was not popular at that time, but now they might be considerable.
>

I did multiple rounds of benchmarking that, just none of it showed any
improvement so I didn't bother reporting them in detail. I have
recently figured out why the performance testing I did of that earlier
patch probably failed to produce useful results on my system when I was
testing it back then though. It relates to trivia around how ext3
handles fsync that's well understood now (the whole cache flushes out
when one comes in), but wasn't back then yet.

We have a working set of patches here that both rewrite the checkpoint
logic to avoid several larger problems with how it works now, as well as
adding instrumentation that makes it possible to directly measure and
graph whether methods such as sorting writes provide any improvement or
not to the process. My hope is to have those all ready for initial
submission as part of CommitFest 2010-11, as the main feature addition
from myself toward improving 9.1.

I have a bunch of background information about this I'm presenting at
PGWest next week, after which I'll start populating the wiki with more
details and begin packaging the code too. I had hoped to revisit the
checkpoint sorting details after that. Jeff or yourself are welcome to
try your own tests in that area, I could use the help. But I think my
measurement patches will help you with that considerably once I release
them in another couple of weeks. Seeing a graph of latency sync times
for each file is very informative for figuring out whether a change did
something useful, more so than just staring at total TPS results. Such
latency graphs are what I've recently started to do here, with some
server-side changes that then feed into gnuplot.

The idea of making something like the sorting logic into a pluggable
hook seems like a waste of time to me, particulary given that the
earlier implementation really needed to be allocated a dedicated block
of shared memory to work well IMHO (and I believe that's still the
case). That area isn't where the real problems are at here anyway,
especially on large memory systems. How the sync logic works is the
increasingly troublesome part of the checkpoint code, because the
problem it has to deal with grows proportionately to the size of the
write cache on the system. Typical production servers I deal with have
about 8X as much RAM now as they did in 2007 when I last investigated
write sorting. Regular hard drives sure haven't gotten 8X faster since
then, and battery-backed caches (which used to have enough memory to
absorb a large portion of a checkpoint burst) have at best doubled in size.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marti Raudsepp 2010-10-29 23:33:18 [PATCH] More Coccinelli cleanups
Previous Message Alvaro Herrera 2010-10-29 20:28:11 Re: crash in plancache with subtransactions