Re: Experimental patch for inter-page delay in VACUUM

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jan Wieck <JanWieck(at)yahoo(dot)com>, Ang Chin Han <angch(at)bytecraft(dot)com(dot)my>, Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Experimental patch for inter-page delay in VACUUM
Date: 2003-11-10 04:14:20
Message-ID: 200311100414.hAA4EKu23543@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> > What still needs to be addressed is the IO storm cause by checkpoints. I
> > see it much relaxed when stretching out the BufferSync() over most of
> > the time until the next one should occur. But the kernel sync at it's
> > end still pushes the system hard against the wall.
>
> I have never been happy with the fact that we use sync(2) at all. Quite
> aside from the "I/O storm" issue, sync() is really an unsafe way to do a
> checkpoint, because there is no way to be certain when it is done. And
> on top of that, it does too much, because it forces syncing of files
> unrelated to Postgres.
>
> I would like to see us go over to fsync, or some other technique that
> gives more certainty about when the write has occurred. There might be
> some scope that way to allow stretching out the I/O, too.
>
> The main problem with this is knowing which files need to be fsync'd.
> The only idea I have come up with is to move all buffer write operations
> into a background writer process, which could easily keep track of
> every file it's written into since the last checkpoint. This could cause
> problems though if a backend wants to acquire a free buffer and there's
> none to be had --- do we want it to wait for the background process to
> do something? We could possibly say that backends may write dirty
> buffers for themselves, but only if they fsync them immediately. As
> long as this path is seldom taken, the extra fsyncs shouldn't be a big
> performance problem.
>
> Actually, once you build it this way, you could make all writes
> synchronous (open the files O_SYNC) so that there is never any need for
> explicit fsync at checkpoint time. The background writer process would
> be the one incurring the wait in most cases, and that's just fine. In
> this way you could directly control the rate at which writes are issued,
> and there's no I/O storm at all. (fsync could still cause an I/O storm
> if there's lots of pending writes in a single file.)

This outlines the same issue --- a very active backend might dirty 5k
buffers --- if those 5k buffers have to be written using O_SYNC, it will
take much longer than doing 5k buffer writes and doing an fsync() or
sync() at the end.

Having another process do the writing does allow some paralellism, but
people don't seem to care of buffers having to be read in from the
kernel buffer cache, so what big benefit do we get by having someone
else write into the kernel buffer cache, except allowing a central place
to fsync, and is it worth it considering that it might be impossible to
configure a system where the writer process can keep up with all the
backends?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-11-10 04:18:33 Re: Experimental patch for inter-page delay in VACUUM
Previous Message Bruce Momjian 2003-11-10 04:07:12 Re: Experimental patch for inter-page delay in VACUUM