Re: Load distributed checkpoint

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Load distributed checkpoint
Date: 2006-12-28 21:28:48
Message-ID: 45943710.1060305@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> To my mind the problem with fsync is not that it gives us too little
> control but that it gives too much: we have to specify a particular
> order of writing out files. What we'd really like is a version of
> sync(2) that tells us when it's done but doesn't constrain the I/O
> scheduler's choices at all. Unfortunately there's no such API ...

The problem I see with fsync is that it causes an immediate I/O storm as
the OS tries to flush everything out as quickly as possible. But we're
not in a hurry. What we'd need is a lazy fsync, that would tell the
operating system "let me know when all these dirty buffers are written
to disk, but I'm not in a hurry, take your time". It wouldn't change the
scheduling of the writes, just inform the caller when they're done.

If we wanted more precise control of the flushing, we could use
sync_file_range on Linux, but that's not portable. Nevertheless, I think
it would be OK to have an ifdef and use it on platforms that support
it, if it gave a benefit.

As a side note, with full_page_writes on, a checkpoint wouldn't actually
need to fsync those pages that have been written to WAL after the
checkpoint started. Doesn't make much difference in most cases, but we
could take that into account if we start taking more control of the
flushing.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2006-12-28 21:32:55 Re: TODO: GNU TLS
Previous Message Andrew Dunstan 2006-12-28 21:10:34 Re: TODO: GNU TLS

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2006-12-28 22:15:25 Re: Recent SIGSEGV failures in buildfarm HEAD
Previous Message Stefan Kaltenbrunner 2006-12-28 21:02:22 Re: Recent SIGSEGV failures in buildfarm HEAD