Re: fsync or fdatasync

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ragnar Kjørstad <postgres(at)ragnark(dot)vestdata(dot)no>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Gaetano Mendola <mendola(at)bigfoot(dot)com>, pgsql-admin(at)postgresql(dot)org
Subject: Re: fsync or fdatasync
Date: 2002-09-10 19:17:00
Message-ID: 21315.1031685420@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

=?iso-8859-1?Q?Ragnar_Kj=F8rstad?= <postgres(at)ragnark(dot)vestdata(dot)no> writes:
> On Tue, Sep 10, 2002 at 11:40:24AM -0400, Bruce Momjian wrote:
>> We use fdatasync where available, and fsync when it is not.

> Makes sense.

>> We also use O_SYNC on open if it is available.

s/also/instead/ ... open_datasync is the first choice if available.

> Why? That will slow tings down...

On what evidence do you assert that?

In theory open_datasync can be the fastest alternative for WAL writing,
because it should cause the kernel to force each WAL write() request
down to disk immediately. fdatasync will result in the same amount of
I/O, but it will also require the kernel to scan its disk cache to see
if there are any other dirty blocks that need to be written. On many
kernels this check is not very efficient and can chew substantial
amounts of CPU time. The tradeoff is that open_datasync syncs each WAL
block individually, which is unnecessary if you are committing
multiple blocks worth of WAL entries at once --- but there's no hard
evidence that that slows things down, especially not when the WAL logs
are on their own disk spindle. Giving the kernel scheduling freedom
for a small number of blocks doesn't help much anyway in that case.

Check the pghackers archives (a year or two back) for lots and lots of
discussion, but I recall we demonstrated that the current default
choices are reasonable for at least some set of Unixen. If you've got
more information showing that the present default is wrong on some
kernel, let's have it ... but don't waste our time with blanket
assertions that "X is the right (or wrong) choice", because we know
that's not so across all the platforms we support. We'd not have
bothered with four sync methods if there weren't good evidence that each
is the best available choice on some platforms.

regards, tom lane

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2002-09-10 19:34:29 Re: Vacuum analyze infos
Previous Message Andrew Sullivan 2002-09-10 18:32:38 Re: pg_restore : ident authentication problem