Re: sync()

From: Giles Lean <giles(at)nemeton(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Brown <kevin(at)sysexperts(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sync()
Date: 2003-01-13 08:31:08
Message-ID: 4993.1042446668@nemeton.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Tom Lane writes:

> Right. "Portably" was the key word in my comment (sorry for not
> emphasizing this more clearly). The real problem here is how to know
> what is the actual behavior of each platform? I'm certainly not
> prepared to trust reading-between-the-lines-of-some-man-pages. And I
> can't think of a simple yet reliable direct test.

Is the "Single Unix Standard, version 2" (aka UNIX98) any better?
It says for fsync():

"The fsync() function forces all currently queued I/O operations
associated with the file indicated by file descriptor fildes to
the synchronised I/O completion state. All I/O operations are
completed as defined for synchronised I/O file integrity
completion."

This to me clearly says that changes to the file must be written,
not just changes made via this file descriptor.

I did have to test this behaviour once (for a customer, strange
situation) but I couldn't find a portable way to do it, either.

What I did was read the appropriate disk block from the raw device to
bypass the buffer cache. As this required low level knowledge of the
on-disk filesystem layout it was not very portable. For anyone
interested Tom Christiansen's "icat" program can be ported to UFS
derived filesystems fairly easily:

http://www.rosat.mpe-garching.mpg.de/mailing-lists/perl5-porters/1997-04/msg00487.html

> AFAIK, all Unix implementations are paranoid about consistency of
> filesystem metadata, including directory contents. So fsync'ing
> directories from a user process strikes me as a waste of time, ...

There is one variant where this is not the case: Linux using ext2fs
and possibly other filesystems.

There was a flame fest of great entertainment value a few years ago
between Linus Torvalds and Dan Bernstein. Of course, neither was able
to influence the opinion of the other to any noticible degree, but it
made fun reading. I think this might be a starting point:

http://www.ornl.gov/cts/archives/mailing-lists/qmail/1998/05/msg00667.html

A more recent posting from Linus where he continues to recommend
fsync() is this:

http://www.cs.helsinki.fi/linux/linux-kernel/2001-29/0659.html

I've not heard that any other Unix-like OS has abandoned the
traditional and POSIX semantic.

> assuming that it were portable, which I doubt. What we need to worry
> about is whether fsync'ing a bunch of our own data files is a practical
> substitute for a global sync() call.

I wish that it were. There are situations (serveral GB buffer caches,
for example) where I mistrust the current use of sync() to have all
writes completed before the sleep() returns. My concern is
theoretical at the moment -- I never get to play with machines that
large!

Regards,

Giles

Responses

  • Re: sync() at 2003-01-31 21:11:47 from Kurt Roeckx

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Kalchev 2003-01-13 08:43:38 Re: default to WITHOUT OIDS?
Previous Message Kevin Brown 2003-01-13 07:38:33 Re: COLUMN MODIFY