Re: fsync reliability

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fsync reliability
Date: 2011-04-21 15:55:55
Message-ID: 24001.1303401355@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Daniel Farina points out to me that the Linux man page for fsync() says
> "Calling fsync() does not necessarily ensure that the entry in the directory
> containing the file has also reached disk. For that an
> explicit fsync() on a
> file descriptor for the directory is also needed."
> http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html

> This point appears to have been discussed before

Yes ...

> Tom said
> "We don't try to "fsync the
> directory" after a normal table create for instance"
> which is fine because we don't need to. In the event of a crash a
> missing table would be recreated during crash recovery.

Nonsense. Once a checkpoint occurs after the WAL record that says to
create the table, we won't replay that action. Or are you proposing
to have checkpoints run around and fsync every directory in the data
tree?

The traditional standard is that the filesystem is supposed to take
care of its own metadata, and even Linux filesystems have pretty much
figured that out. I don't really see a need for us to be nursemaiding
the filesystem. At most there's a documentation issue here, ie, we
ought to be more explicit about which filesystems and which mount
options we recommend.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2011-04-21 16:05:50 Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog?
Previous Message Robert Haas 2011-04-21 15:51:46 Re: "stored procedures"