Re: Load distributed checkpoint V3

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: Load distributed checkpoint V3
Date: 2007-04-06 06:53:17
Message-ID: Pine.GSO.4.64.0704060148090.15433@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Fri, 6 Apr 2007, Takayuki Tsunakawa wrote:

> could anyone evaluate O_SYNC approach again that commercial databases
> use and tell me if and why PostgreSQL's fsync() approach is better than
> theirs?

I noticed a big improvement switching the WAL to use O_SYNC (+O_DIRECT)
instead of fsync on my big and my little servers with battery-backed
cache, so I know sync writes perform reasonably well on my hardware.
Since I've had problems with the fsync at checkpoint time, I did a similar
test to yours recently, adding O_SYNC to the open calls and pulling the
fsyncs out to get a rough idea how things would work.

Performance was reasonable most of the time, but when I hit a checkpoint
with a lot of the buffer cache dirty it was incredibly bad. It took
minutes to write everything out, compared with a few seconds for the
current case, and the background writer was too sluggish as well to help.
This appears to match your data.

If you compare how Oracle handles their writes and checkpoints to the
Postgres code, it's obvious they have a different architecture that
enables them to support sync writing usefully. I'd recommend the Database
Writer Process section of
http://www.lc.leidenuniv.nl/awcourse/oracle/server.920/a96524/c09procs.htm
as an introduction for those not familiar with that; it's interesting
reading for anyone tinking with background writer code.

It would be great to compare performance of the current PostgreSQL code
with a fancy multiple background writer version using the latest sync
methods or AIO; there have actually been multiple updates to improve
O_SYNC writes within Linux during the 2.6 kernel series that make this
more practical than ever on that platform. But as you've already seen,
the performance hurdle to overcome is significant, and it would have to be
optional as a result. When you add all this up--have to keep the current
non-sync writes around as well, need to redesign the whole background
writer/checkpoint approach around the idea of sync writes, and the
OS-specific parts that would come from things like AIO--it gets real
messy. Good luck drumming up support for all that when the initial
benchmarks suggest it's going to be a big step back.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2007-04-06 06:56:53 Re: Bug in UTF8-Validation Code?
Previous Message Suresh 2007-04-06 06:44:06 Postgres optimizer

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2007-04-06 07:11:18 Re: Auto Partitioning
Previous Message Tom Lane 2007-04-06 06:37:15 Re: [PATCHES] Fix mdsync never-ending loop problem