Re: Spread checkpoint sync

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spread checkpoint sync
Date: 2011-01-31 18:44:38
Message-ID: AANLkTim-rv-im_oYPSLCqLffX6XLA9guUKrMz7QE-Uv6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 31, 2011 at 12:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I wonder whether it'd be useful to keep track of the total amount of
>>> data written-and-not-yet-synced, and to issue fsyncs often enough to
>>> keep that below some parameter; the idea being that the parameter would
>>> limit how much dirty kernel disk cache there is.  Of course, ideally the
>>> kernel would have a similar tunable and this would be a waste of effort
>>> on our part...
>
>> It's not clear to me how you'd maintain that information without it
>> turning into a contention bottleneck.
>
> What contention bottleneck?  I was just visualizing the bgwriter process
> locally tracking how many writes it'd issued.  Backend-issued writes
> should happen seldom enough to be ignorable for this purpose.

Ah. Well, if you ignore backend writes, then yes, there's no
contention bottleneck. However, I seem to recall Greg Smith showing a
system at PGCon last year with a pretty respectable volume of backend
writes (30%?) and saying "OK, so here's a healthy system". Perhaps
I'm misremembering. But at any rate any backend that is using a
BufferAccessStrategy figures to do a lot of its own writes. This is
probably an area for improvement in future releases, if we an figure
out how to do it: if we're doing a bulk load into a system with 4GB of
shared_buffers using a 16MB ring buffer, we'd ideally like the
background writer - or somebody other than the foreground process - to
go nuts on those buffers, writing them out as fast as it possibly can
- rather than letting the backend do it when the ring wraps around.

Back to the idea at hand - I proposed something a bit along these
lines upthread, but my idea was to proactively perform the fsyncs on
the relations that had gone the longest without a write, rather than
the ones with the most dirty data. I'm not sure which is better.
Obviously, doing the ones that have "gone idle" gives the OS more time
to write out the data, but OTOH it might not succeed in purging much
dirty data. Doing the ones with the most dirty data will definitely
reduce the size of the final checkpoint, but might also cause a
latency spike if it's triggered immediately after heavy write activity
on that file.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-01-31 18:48:38 pgsql: Update docs on building for Windows to accomodate current realit
Previous Message Jeff Davis 2011-01-31 18:40:47 Re: SSI patch version 14