Markus Wanner wrote:
> Martijn van Oosterhout wrote:
>> And fsync better do what you're asking
>> (how fast is just a performance issue, just as long as it's done).
> Where are we on this issue? I've read all of this thread and the one on
> the lvm-linux mailing list as well, but still don't feel confident.
> In the following scenario:
> fsync -> filesystem -> physical disk
> I'm assuming the filesystem correctly issues an blkdev_issue_flush() on
> the physical disk upon fsync(), to do what it's told: flush the cache(s)
> to disk. Further, I'm also assuming the physical disk is flushable (i.e.
> it correctly implements the blkdev_issue_flush() call). Here we can be
> pretty certain that fsync works as advertised, I think.
> The unanswered question to me is, what's happening, if I add LVM in
> between as follows:
> fsync -> filesystmem -> device mapper (lvm) -> physical disk(s)
> Again, assume the filesystem issues a blkdev_issue_flush() to the lower
> layer and the physical disks are all flushable (and implement that
> correctly). How does the device mapper behave?
> I'd expect it to forward the blkdev_issue_flush() call to all affected
> devices and only return after the last one has confirmed and completed
> flushing its caches. Is that the case?
> I've also read about the newish write barriers and about filesystems
> implementing fsync with such write barriers. That seems fishy to me and
> would of course break in combination with LVM (which doesn't completely
> support write barriers, AFAIU). However, that's clearly the filesystem
> side of the story and has not much to do with whether fsync lies on top
> of LVM or not.
> Help in clarifying this issue greatly appreciated.
> Kind Regards
> Markus Wanner
Well, AFAIK, the summary would be:
1) adding LVM to the chain makes no difference;
2) you still need to disable the write-back cache in IDE/SATA disks,
for fsync() to work properly.
3) without LVM and with write-back cache enabled, due to current(?)
limitations in the linux kernel, with some journaled filesystems
(but not ext3 in data=write-back or data=ordered mode, I'm not sure
about data=journal), you may be less vulnerable, if you use fsync()
"less vulnerable" means that all pending changes are commetted to disk,
but the very last one.
- write-back cache + EXT3 = unsafe
- write-back cache + other fs = (depending on the fs)[*] safer but not 100% safe
- write-back cache + LVM + any fs = unsafe
- write-thru cache + any fs = safe
- write-thru cache + LVM + any fs = safe
[*] the fs must use (directly or indirectly via journal commit) a write barrier
on fsync(). Ext3 doesn't (it does when the inode changes, but that happens
once a second only).
If you want both speed and safety, use a batter-backed controller (and write-thru
cache on disks, but the controller should enforce it when you plug the disks in).
It's the usual "Fast, Safe, Cheap: choose two".
This is an interesting article:
note how for all three kinds of disk (IDE/SATA/SCSI) they say:
"Disk caching should be disabled in order to use the drive with SQL Server".
They don't mention write barriers.
In response to
pgsql-general by date
|Next:||From: ray||Date: 2009-03-30 13:28:20|
|Subject: Re: Installing PLPython - Version Problem|
|Previous:||From: Thomas Kellerer||Date: 2009-03-30 12:33:12|
|Subject: Re: New shapshot RPMs (Mar 27, 2009) are ready for testing|