Re: Load distributed checkpoint

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Load distributed checkpoint
Date: 2006-12-29 20:28:32
Message-ID: 20061229202831.GA71246@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Thu, Dec 28, 2006 at 09:28:48PM +0000, Heikki Linnakangas wrote:
> Tom Lane wrote:
> >To my mind the problem with fsync is not that it gives us too little
> >control but that it gives too much: we have to specify a particular
> >order of writing out files. What we'd really like is a version of
> >sync(2) that tells us when it's done but doesn't constrain the I/O
> >scheduler's choices at all. Unfortunately there's no such API ...
>
> The problem I see with fsync is that it causes an immediate I/O storm as
> the OS tries to flush everything out as quickly as possible. But we're
> not in a hurry. What we'd need is a lazy fsync, that would tell the
> operating system "let me know when all these dirty buffers are written
> to disk, but I'm not in a hurry, take your time". It wouldn't change the
> scheduling of the writes, just inform the caller when they're done.
>
> If we wanted more precise control of the flushing, we could use
> sync_file_range on Linux, but that's not portable. Nevertheless, I think
> it would be OK to have an ifdef and use it on platforms that support
> it, if it gave a benefit.

I believe there's something similar for OS X as well. The question is:
would it be better to do that, or to just delay calling fsync until the
OS has had a chance to write things out.

> As a side note, with full_page_writes on, a checkpoint wouldn't actually
> need to fsync those pages that have been written to WAL after the
> checkpoint started. Doesn't make much difference in most cases, but we
> could take that into account if we start taking more control of the
> flushing.

Hrm, interesting point, but I suspect the window involved there is too
small to be worth worrying about.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2006-12-29 20:59:08 Small vcbuild patch
Previous Message Stephen Frost 2006-12-29 20:00:41 Re: TODO: GNU TLS

Browse pgsql-patches by date

  From Date Subject
Next Message Magnus Hagander 2006-12-29 20:59:08 Small vcbuild patch
Previous Message Roman Kononov 2006-12-29 19:54:41 Re: [HACKERS] [BUGS] BUG #2846: inconsistent and confusing