Re: Experimental patch for inter-page delay in VACUUM

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Experimental patch for inter-page delay in VACUUM
Date: 2003-11-05 15:09:01
Message-ID: 1969.1068044941@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> You want to find, open, and fsync() every file in the database cluster
>> for every checkpoint? Sounds like a non-starter to me.

> Except a) this is outside any critical path, and b) only done every few
> minutes and c) the fsync calls on files with no dirty buffers ought to be
> cheap, at least as far as i/o.

The directory search and opening of the files is in itself nontrivial
overhead ... particularly on systems where open(2) isn't speedy, such
as Solaris. I also disbelieve your assumption that fsync'ing a file
that doesn't need it will be free. That depends entirely on what sort
of indexes the OS keeps on its buffer cache. There are Unixen where
fsync requires a scan through the entire buffer cache because there is
no data structure that permits finding associated buffers any more
efficiently than that. (IIRC, the HPUX system I'm typing this on is
like that.) On those sorts of systems, we'd be way better off to use
O_SYNC or O_DSYNC on all our writes than to invoke multiple fsyncs.
Check the archives --- this was all gone into in great detail when we
were testing alternative methods for fsyncing the WAL files.

> So the NetBSD and Sun developers I checked with both asserted fsync does in
> fact guarantee this. And SUSv2 seems to back them up:

> The fsync() function can be used by an application to indicate that all
> data for the open file description named by fildes is to be transferred to
> the storage device associated with the file described by fildes in an
> implementation-dependent manner.

The question here is what is meant by "data for the open file
description". If it said "all data for the file referenced by the open
FD" then I would agree that the spec says what you claim. As is, I
think it would be entirely within the spec for the OS to dump only
buffers that had been dirtied through that particular FD. Notice that
the last part of the sentence is careful to respect the distinction
between the FD and the file; why isn't the first part?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Mahotkin 2003-11-05 15:11:05 Re: UPPER()/LOWER() and UTF-8
Previous Message vjanand 2003-11-05 15:08:31 BTree index