Re: Just-in-time Background Writer Patch+Test Results

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Just-in-time Background Writer Patch+Test Results
Date: 2007-09-07 00:20:13
Message-ID: 46E052ED.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>> On Thu, Sep 6, 2007 at 11:27 AM, in message
<Pine(dot)GSO(dot)4(dot)64(dot)0709061121020(dot)14491(at)westnet(dot)com>, Greg Smith
<gsmith(at)gregsmith(dot)com> wrote:
> On Thu, 6 Sep 2007, Kevin Grittner wrote:
>
> I have been staring carefully at your configuration recently, and I would
> wager that you could turn off the LRU writer altogether and still meet
> your requirements in 8.2.

I totally agree that it is of minor benefit compared to the all-writer,
if it even matters at all. I knew that when I chose the settings.

> Here's what you've got right now:
>
>> shared_buffers = 160MB (=20000 buffers)
>> bgwriter_lru_percent = 20.0
>> bgwriter_lru_maxpages = 200
>> bgwriter_all_percent = 10.0
>> bgwriter_all_maxpages = 600
>
> With the default delay of 200ms, this has the LRU-writer scanning the
> whole pool every 1 second,

Whoa! Apparently I've totally misread the documentation. I thought that
the bgwriter_lru_percent was scanned from the lru end each time; I would
not expect that it would ever get beyond the oldest 10%. I put that in
just as a guard to keep the backends from having to wait for the OS write.
I've always doubted whether it was helping, but "it wasn't broke"....

> while the all-writer scans every two
> seconds--assuming they don't hit the write limits. If some event were to
> dirty the whole pool in 200ms, it might take as much as 6.7 seconds to
> write everything out (20000 / 600 * 200 ms) via the all-scan.

Right. Since the file system didn't seem to be able to accept writes
faster than 800 PostgreSQL pages per second, and I wanted to leave a
LITTLE slack, I set that limit. We don't seem to hit it, as far as I can
tell. In fact, the output rate would be naturally fairly smooth, if not
for the "hold all dirty pages until the last possible moment, then write
them all to the OS and fsync" approach.

> There's a second low-level issue involved here. When a page becomes
> dirty, that implies it was also recently used, which means the LRU writer
> won't touch it. That page can't be written out by the LRU writer until an
> entire pass has been made over the shared_buffer pool while looking for
> buffers to allocate for new activity. When the allocation clock-sweep
> passes over the newly dirtied buffer again, its usage count will drop by
> one and it will no longer be considered recently used. At that point the
> LRU writer can write it out.

How low does the count have to go, or does it track the count when it
becomes dirty and look for a decrease?

> So unless there is other allocation activity
> going on, the scan_whole_pool_seconds mechanism will never provide the
> bound on time to scan and write everything you hope it will.

That may not be an issue for the environment where this has been a problem
for us -- the web hits are coming in at a pretty good rate 24/7. (We have
a couple dozen large companies scanning data through HTTP SOAP requests
all the time.) This should keep us reading new pages, which covers this,
yes?

> where the buffer cache was
> filled with mostly dirty buffers that couldn't be re-used

That would be the condition that would be the killer with a synchronous
checkpoint if the OS cache has already had some dirty pages trickled out.
If we can hit this condition in our web database, either the load
distributed checkpoint will save us, or we can't use 8.3. Period.

> The completely understandable line of thinking that led to your request
> here is one of my concerns with exposing scan_whole_pool_seconds as a
> tunable. It may suggest to people that if they set the number very low,
> it will assure all dirty buffers will be scanned and written within that
> time bound. That's certainly not the case; both the maxpages and the
> usage count information will actually drive the speed that mechanism plods
> through the buffer cache. It really isn't useful for scanning fast.

I'm not clear on the benefit of not writing the recently accessed dirty
pages when there are no less recently used dirty pages. I do trust the OS
to not write them before they age out in that cache, and the OS cache
doesn't start writing dirty pages from its cache until they reach a
certain percentage of the cache space, so I'd just as soon let the OS know
that the MRU dirty pages are there, so it knows that it's time to start
working on the LRU pages in its cache.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2007-09-07 00:31:51 Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Previous Message Jeff Davis 2007-09-07 00:03:42 Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)