Re: Experimental patch for inter-page delay in VACUUM

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Ang Chin Han <angch(at)bytecraft(dot)com(dot)my>, Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Experimental patch for inter-page delay in VACUUM
Date: 2003-11-04 15:31:39
Message-ID: 22099.1067959899@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> What still needs to be addressed is the IO storm cause by checkpoints. I
> see it much relaxed when stretching out the BufferSync() over most of
> the time until the next one should occur. But the kernel sync at it's
> end still pushes the system hard against the wall.

I have never been happy with the fact that we use sync(2) at all. Quite
aside from the "I/O storm" issue, sync() is really an unsafe way to do a
checkpoint, because there is no way to be certain when it is done. And
on top of that, it does too much, because it forces syncing of files
unrelated to Postgres.

I would like to see us go over to fsync, or some other technique that
gives more certainty about when the write has occurred. There might be
some scope that way to allow stretching out the I/O, too.

The main problem with this is knowing which files need to be fsync'd.
The only idea I have come up with is to move all buffer write operations
into a background writer process, which could easily keep track of
every file it's written into since the last checkpoint. This could cause
problems though if a backend wants to acquire a free buffer and there's
none to be had --- do we want it to wait for the background process to
do something? We could possibly say that backends may write dirty
buffers for themselves, but only if they fsync them immediately. As
long as this path is seldom taken, the extra fsyncs shouldn't be a big
performance problem.

Actually, once you build it this way, you could make all writes
synchronous (open the files O_SYNC) so that there is never any need for
explicit fsync at checkpoint time. The background writer process would
be the one incurring the wait in most cases, and that's just fine. In
this way you could directly control the rate at which writes are issued,
and there's no I/O storm at all. (fsync could still cause an I/O storm
if there's lots of pending writes in a single file.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2003-11-04 15:45:22 Re: Experimental patch for inter-page delay in VACUUM
Previous Message Fabien COELHO 2003-11-04 15:11:27 minor suggestion about rule syntax