Re: Directory fsync and other fun

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Florian Weimer <fw(at)deneb(dot)enyo(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Directory fsync and other fun
Date: 2010-02-24 07:09:23
Message-ID: 407d949e1002232309m3518538fta5ebc5ef18c1ea7e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 24, 2010 at 2:51 AM, Takahiro Itagaki
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> Also, I heard ext4 has a "feature" in that rename() might truncate the
> renamed file to zero bytes on crash. The user data in the file might be
> lost if the machine crashes just after rename().

In our case I think this is the one thing that cannot happen. This
happens when you write out the new file and rename it over the old
file without every fsyncing the new file. If the rename succeeds but
all the writes get lost you end up with neither the new nor old file.

The ext4 guys want you do to do an fsync of the new file before doing
the rename. This is terrible for most of the applications that were
doing this -- the latency hit for interactive apps that didn't really
need an fsync is awful -- but in our case we were already doing fsyncs
in every case where we do renames.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2010-02-24 07:11:34 Re: pgsql: Un-break pg_dump for the case of zero-column tables.
Previous Message Pavel Stehule 2010-02-24 06:52:06 Re: Issues for named/mixed function notation patch