Skip site navigation (1) Skip section navigation (2)

Re: Load distributed checkpoint

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgreSQL(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject: Re: Load distributed checkpoint
Date: 2006-12-22 21:13:49
Message-ID: 87zm9fzl9u.fsf@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

> I have a new idea.  Rather than increasing write activity as we approach
> checkpoint, I think there is an easier solution.  I am very familiar
> with the BSD kernel, and it seems they have a similar issue in trying to
> smooth writes:

Just to give a bit of context for this. The traditional mechanism for syncing
buffers to disk on BSD which this daemon was a replacement for was to simply
call "sync" every 30s. Compared to that this daemon certainly smooths the I/O
out over the 30s window...

Linux has a more complex solution to this (of course) which has undergone a
few generations over time. Older kernels had a user space daemon called
bdflush which called an undocumented syscall every 5s. More recent ones have a
kernel thread called pdflush. I think both have various mostly undocumented
tuning knobs but neither makes any sort of guarantee about the amount of time
a dirty buffer might live before being synced.

Your thinking is correct but that's already the whole point of bgwriter isn't
it? To get the buffers out to the kernel early in the checkpoint interval so
that come checkpoint time they're hopefully already flushed to disk. As long
as your checkpoint interval is well over 30s only the last 30s (or so, it's a
bit fuzzier on Linux) should still be at risk of being pending.

I think the main problem with an additional pause in the hopes of getting more
buffers synced is that during the 30s pause on a busy system there would be a
continual stream of new dirty buffers being created as bgwriter works and
other backends need to reuse pages. So when the fsync is eventually called
there will still be a large amount of i/o to do. Fundamentally the problem is
that fsync is too blunt an instrument. We only need to fsync the buffers we
care about, not the entire file.


-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

In response to

Responses

pgsql-hackers by date

Next:From: Gregory StarkDate: 2006-12-22 21:21:25
Subject: Re: Operator class group proposal
Previous:From: Jeremy DrakeDate: 2006-12-22 21:00:17
Subject: Re: recent --with-libxml support

pgsql-patches by date

Next:From: Greg SmithDate: 2006-12-22 21:33:43
Subject: Re: Load distributed checkpoint
Previous:From: Inaam RanaDate: 2006-12-22 18:56:07
Subject: Re: Load distributed checkpoint

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group