Re: fsync or fdatasync

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Ragnar Kjørstad <postgres(at)ragnark(dot)vestdata(dot)no>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gaetano Mendola <mendola(at)bigfoot(dot)com>, pgsql-admin(at)postgresql(dot)org
Subject: Re: fsync or fdatasync
Date: 2002-09-10 21:07:30
Message-ID: 200209102107.g8AL7UN23077@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Ragnar Kjrstad wrote:
> > open_datasync is the first choice if available.
>
> I assume open_datasync means open with O_SYNC flag..

Yes.

> > > Why? That will slow tings down...
> >
> > On what evidence do you assert that?
> >
> > In theory open_datasync can be the fastest alternative for WAL writing,
> > because it should cause the kernel to force each WAL write() request
> > down to disk immediately. fdatasync will result in the same amount of
> > I/O, but it will also require the kernel to scan its disk cache to see
> > if there are any other dirty blocks that need to be written. On many
> > kernels this check is not very efficient and can chew substantial
> > amounts of CPU time.
>
> Yes, I see your argument.
> However, I've just checked the linux-implementation of fsync() and I
> can't really see how it could chew substantial amounts of CPU time. The
> way it works every inode has a list of dirty data buffers - all it does
> it traverse that list and do a write on each.

Remember we support >15 platforms, and I know there is at least one
(HPUX?) which does the fsync/fdatasync block finding inefficiently. It
may have even been old Linux; I can not remember.

> Anyway - I'm sure this is not enough to convince you, so I'll have to
> set up a test instead. But not tonight.

Again, that is a test case for only one OS. It is helpful if we are
going to start doing per-OS defaults, which is something we have talked
about. What would be great is a test program we can run on different
OS's to find out which is more efficient.
>
>
> > The tradeoff is that open_datasync syncs each WAL
> > block individually, which is unnecessary if you are committing
> > multiple blocks worth of WAL entries at once --- but there's no hard
> > evidence that that slows things down, especially not when the WAL logs
> > are on their own disk spindle.
>
> Well, in theory fsync() will allow the disk to reorder the writes, and
> that should give significantly better performance, because it will
> reduce the required number of seeks. If the WAL is on a seperate spindel
> there will very few seeks in the first place, so there is less to gain,
> but for the case with the WAL on the same disk as something else there
> is probably some gain. But it makes sense to optimize for the
> WAL-on-seperate-disk case...

Remember, in most cases, we are fsync'ing only one block so there is no
_gathering_ to do.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Mark Worsdall 2002-09-10 21:52:07 Do the datatypes have set id/oid that are constant?
Previous Message Ragnar Kjørstad 2002-09-10 20:48:30 Re: fsync or fdatasync