Re: Raid 10 chunksize

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Stef Telford <stef(at)ummon(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Raid 10 chunksize
Date: 2009-04-03 10:30:12
Message-ID: alpine.GSO.2.01.0904030556470.4011@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 2 Apr 2009, Scott Carey wrote:

> The big one, is this quote from the linux kernel list:
> " Right now, if you want a reliable database on Linux, you _cannot_
> properly depend on fsync() or fdatasync(). Considering how much Linux
> is used for critical databases, using these functions, this amazes me.
> "

Things aren't as bad as that out of context quote makes them seem. There
are two main problem situations here:

1) You cannot trust Linux to flush data to a hard drive's write cache.
Solution: turn off the write cache. Given the general poor state of
targeted fsync on Linux (quoting from a downthread comment by David Lang:
"in data=ordered mode, the default for most distros, ext3 can end up
having to write all pending data when you do a fsync on one file"), those
fsyncs were likely to blow out the drive cache anyway.

2) There are no hard guarantees about write ordering at the disk level; if
you write blocks ABC and then fsync, you might actually get, say, only B
written before power goes out. I don't believe the PostgreSQL WAL design
will be corrupted by this particular situation, because until that fsync
comes back saying all 3 are done none of them are relied upon.

> Interestingly, postgres would be safer on linux if it used
> sync_file_range instead of fsync() but that has other drawbacks and
> limitations

I have thought about whether it would be possible to add a Linux-specific
improvement here into the code path that does something custom in this
area for Windows/Mac OS X when you use fsync_method=fsync_writethrough

We really should update the documentation in this area before 8.4 ships.
I'm looking into moving the "Tuning PostgreSQL WAL Synchronization" paper
I wrote onto the wiki and then fleshing it out with all this
filesystem-specific trivia.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2009-04-03 13:17:58 Rewriting using rules for performance
Previous Message Greg Smith 2009-04-03 09:53:25 Re: Raid 10 chunksize