Re: Background writer process

From: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Background writer process
Date: 2003-11-17 08:33:38
Message-ID: 200311171403.38713.shridhar_daithankar@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday 14 November 2003 22:10, Jan Wieck wrote:
> Shridhar Daithankar wrote:
> > On Friday 14 November 2003 03:05, Jan Wieck wrote:
> >> For sure the sync() needs to be replaced by the discussed fsync() of
> >> recently written files. And I think the algorithm how much and how often
> >> to flush can be significantly improved. But after all, this does not
> >> change the real checkpointing at all, and the general framework having a
> >> separate process is what we probably want.
> >
> > Having fsync for regular data files and sync for WAL segment a
> > comfortable compramise? Or this is going to use fsync for all of them.
> >
> > IMO, with fsync, we tell kernel that you can write this buffer. It may or
> > may not write it immediately, unless it is hard sync.
>
> I think it's more the other way around. On some systems sync() might
> return before all buffers are flushed to disk, while fsync() does not.

Oops.. that's bad.

> > Since postgresql can afford lazy writes for data files, I think this
> > could work.
>
> The whole point of a checkpoint is to know for certain that a specific
> change is in the datafile, so that it is safe to throw away older WAL
> segments.

I just made another posing on patches for a thread crossing win32-devel.

Essentially I said

1. Open WAL files with O_SYNC|O_DIRECT or O_SYNC(Not sure if current code does
it. The hackery in xlog.c is not exactly trivial.)
2. Open data files normally and fsync them only in background writer process.

Now BGWriter process will flush everything at the time of checkpointing. It
does not need to flush WAL because of O_SYNC(ideally but an additional fsync
won't hurt). So it just flushes all the file decriptors touched since last
checkpoint, which should not be much of a load because it is flushing those
files intermittently anyways.

It could also work nicely if only background writer fsync the data files.
Backends can either wait or proceed to other business by the time disk is
flushed. Backends needs to wait for certain while committing and it should be
rather small delay of syncing to disk in current process as opposed to in
background process.

In case of commit, BGWriter could get away with files touched in transaction
+WAL as opposed to all files touched since last checkpoint+WAL in case of
chekpoint. I don't know how difficult that would be.

What is different in currrent BGwriter implementation? Use of sync()?

Shridhar

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tommi Maekitalo 2003-11-17 09:52:25 Re: Release now live ...
Previous Message Shridhar Daithankar 2003-11-17 08:15:49 Re: [pgsql-hackers-win32] SRA Win32 sync() code