Skip site navigation (1) Skip section navigation (2)

Re: Re: Faster CREATE DATABASE by delaying fsync

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Florian Weimer <fw(at)deneb(dot)enyo(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Re: Faster CREATE DATABASE by delaying fsync
Date: 2010-02-15 00:08:10
Message-ID: 4B78906A.7020309@mark.mielke.cc (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
On 02/14/2010 03:49 PM, Andres Freund wrote:
> On Sunday 14 February 2010 21:41:02 Mark Mielke wrote:
>    
>> The widely reported problems, though, did not tend to be a problem with
>> directory changes written too late - but directory changes being written
>> too early. That is, the directory change is written to disk, but the
>> file content is not. This is likely because of the "ordered journal"
>> mode widely used in ext3/ext4 where metadata changes are journalled, but
>> file pages are not journalled. Therefore, it is important for some
>> operations, that the file pages are pushed to disk using fsync(file),
>> before the metadata changes are journalled.
>>      
> Well, but thats not a problem with pg as it fsyncs the file contents.
>    

Exactly. Not a problem.

>> If you are concerned, enable dirsync.
>>      
> If the filesystem already behaves that way a fsync on it should be fairly
> cheap. If it doesnt behave that way doing it is correct...
>    

Well, I disagree, as the whole point of this thread is that fsync() is 
*not* cheap. :-)

> Besides there is no reason to fsync the directory before the checkpoint, so
> dirsync would require a higher cost than doing it correctly.
>    

Using "ordered" metadata journaling has approximately the same effect. 
Provided that the data is fsync()'d before the metadata is required, 
either the metadata is recorded in the journal, in which case the data 
is accessible, or the metadata is NOT recorded in the journal, in which 
case, the files will appear missing. The races that theoretically exist 
would be in situations where the data of one file references a separate 
file that does not yet exist.

You said you would try and reproduce - are you going to try and 
reproduce on ext3/ext4 with ordered journalling enabled? I think 
reproducing outside of a case such as CREATE DATABASE would be 
difficult. It would have to be something like:

     open(O_CREAT)/write()/fsync()/close() of new data file, where data 
gets written, but directory data is not yet written out to journal
     open()/.../write()/fsync()/close() of existing file to point to new 
data file, but directory data is still not yet written out to journal
     crash

In this case, "dirsync" should be effective at closing this hole.

As for cost? Well, most PostgreSQL data is stored within file content, 
not directory metadata. I think "dirsync" might slow down some 
operations like CREATE DATABASE or "rm -fr", but I would not expect it 
to effect day-to-day performance of the database under real load. Many 
operating systems enable the equivalent of "dirsync" by default. I 
believe Solaris does this, for example, and other than slowing down "rm 
-fr", I don't recall any real complaints about the cost of "dirsync".

After writing the above, I'm seriously considering adding "dirsync" to 
my /db mounts that hold PostgreSQL and MySQL data.

Cheers,
mark

-- 
Mark Mielke<mark(at)mielke(dot)cc>


In response to

pgsql-performance by date

Next:From: AI RummanDate: 2010-02-15 09:35:01
Subject: Why primary key index are not using in joining?
Previous:From: Greg StarkDate: 2010-02-14 23:33:54
Subject: Re: Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

pgsql-hackers by date

Next:From: Greg StarkDate: 2010-02-15 00:50:57
Subject: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
Previous:From: Greg StarkDate: 2010-02-14 23:33:54
Subject: Re: Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group