Re: Disaster!

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Manfred Spraul <manfred(at)colorfullife(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Disaster!
Date: 2004-01-25 23:26:05
Message-ID: 87hdyjvcdu.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Manfred Spraul <manfred(at)colorfullife(dot)com> writes:

> The checkpoint code uses sync() right now. Actually sync();sleep(2);sync().
> Win32 has no sync() call, therefore it will use fsyncs. Perhaps platforms with
> deferred errors on close must use fsync, too. Hopefully parallel fsyncs -
> sequential fsyncs could be slow due to more seeking.

That code is known to be totally bogus in theory. However in practice it seems
to be the best of the possible bad choices.

Even on filesystems where errors won't be deferred after the write() the data
is still not guaranteed to be on disk. Even after the sync() call. There's no
guarantee of any particular sleep time being enough.

This was brought up a few months ago. The only safe implementation would be to
fsync every file descriptor that had received writes. The problem is keeping
track of which file descriptors those are. Also people were uncertain whether
a backend opening a file and calling fsync would guarantee that writes written
to the same file by other processes through other file descriptors would be
flushed. I'm fairly convinced they would be on all sane vfs implementations
but others were less convinced.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2004-01-25 23:42:20 Re: Named arguments in function calls
Previous Message Greg Stark 2004-01-25 23:21:46 Re: Named arguments in function calls