Re: Re: Faster CREATE DATABASE by delaying fsync

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Florian Weimer <fw(at)deneb(dot)enyo(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Re: Faster CREATE DATABASE by delaying fsync
Date: 2010-02-14 20:49:09
Message-ID: 201002142149.12786.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Sunday 14 February 2010 21:41:02 Mark Mielke wrote:
> On 02/14/2010 03:24 PM, Florian Weimer wrote:
> > * Tom Lane:
> >>> Which options would that be? I am not aware that there any for any of
> >>> the recent linux filesystems.
> >>
> >> Shouldn't journaling of metadata be sufficient?
> >
> > You also need to enforce ordering between the directory update and the
> > file update. The file metadata is flushed with fsync(), but the
> > directory isn't. On some systems, all directory operations are
> > synchronous, but not on Linux.
>
> dirsync
> All directory updates within the filesystem should be
> done syn-
> chronously. This affects the following system calls:
> creat,
> link, unlink, symlink, mkdir, rmdir, mknod and rename.
>
> The widely reported problems, though, did not tend to be a problem with
> directory changes written too late - but directory changes being written
> too early. That is, the directory change is written to disk, but the
> file content is not. This is likely because of the "ordered journal"
> mode widely used in ext3/ext4 where metadata changes are journalled, but
> file pages are not journalled. Therefore, it is important for some
> operations, that the file pages are pushed to disk using fsync(file),
> before the metadata changes are journalled.
Well, but thats not a problem with pg as it fsyncs the file contents.

> In theory there is some open hole where directory updates need to be
> synchronized with file updates, as POSIX doesn't enforce this ordering,
> and we can't trust that all file systems implicitly order things
> correctly, but in practice, I don't see this sort of problem happening.
I can try to reproduce it if you want...

> If you are concerned, enable dirsync.
If the filesystem already behaves that way a fsync on it should be fairly
cheap. If it doesnt behave that way doing it is correct...

Besides there is no reason to fsync the directory before the checkpoint, so
dirsync would require a higher cost than doing it correctly.

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-02-14 20:57:08 Re: Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Previous Message Mark Mielke 2010-02-14 20:41:02 Re: Re: Faster CREATE DATABASE by delaying fsync

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2010-02-14 20:57:08 Re: Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Previous Message Mark Mielke 2010-02-14 20:41:02 Re: Re: Faster CREATE DATABASE by delaying fsync