Re: [HACKERS] Sync vs. fsync during checkpoint

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "PostgreSQL Win32 port list" <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: [HACKERS] Sync vs. fsync during checkpoint
Date: 2004-02-05 14:54:49
Message-ID: 2297.1075992889@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-hackers-win32

"Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at> writes:
> So Imho the target should be to have not much IO open for the checkpoint,
> so the fsync is fast enough, even if serial.

The best we can do is push out dirty pages with write() via the bgwriter
and hope that the kernel will see fit to write them before checkpoint
time arrives. I am not sure if that hope has basis in fact or if it's
just wishful thinking. Most likely, if it does have basis in fact it's
because there is a standard syncer daemon forcing a sync() every thirty
seconds.

That means that instead of an I/O storm every checkpoint interval,
we get a smaller I/O storm every 30 seconds. Not sure this is a big
improvement. Jan already found out that issuing very frequent sync()s
isn't a win.

People keep saying that the bgwriter mustn't write pages synchronously
because it'd be bad for performance, but I think that analysis is
faulty. Performance of what --- the bgwriter? Nonsense, the *point*
of the bgwriter is to do the slow tasks. The only argument that has
any merit is that O_SYNC or immediate fsync will prevent us from having
multiple writes outstanding and thus reduce the efficiency of disk
write scheduling. This is a valid point but there is a limit to how
many writes we need to have in flight to keep things flowing smoothly.

What I'm thinking now is that the bgwriter should issue frequent fsyncs
for its writes --- not immediate, but a lot more often than once per
checkpoint. Perhaps take one recently-written unsynced file to fsync
every time it is about to sleep. You could imagine various rules for
deciding which one to sync; perhaps the one with the most writes issued
against it since last sync. When we have tablespaces it'd make sense to
try to distribute the syncs across tablespaces, on the assumption that
the tablespaces are probably on different drives.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-02-05 14:57:28 Re: PITR Dead horse?
Previous Message Mark Gibson 2004-02-05 14:46:57 dblink - custom datatypes don't work

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message Shridhar Daithankar 2004-02-05 15:15:48 Re: [pgsql-hackers-win32] Sync vs. fsync during checkpoint
Previous Message Claudio Natoli 2004-02-05 12:16:25 Re: win32 signals, part 4