Re: Automatic adjustment of bgwriter_lru_maxpages

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Automatic adjustment of bgwriter_lru_maxpages
Date: 2007-05-13 16:27:20
Message-ID: 46473C68.2090008@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches pgsql-performance

Greg Smith wrote:
> The original code came from before there was a pg_stat_bgwriter. The
> first patch (buf-alloc-stats) takes the two most interesting pieces of
> data the original patch collected, the number of buffers allocated
> recently and the number that the clients wrote out, and ties all that
> into the new stats structure. With this patch applied, you can get a
> feel for things like churn/turnover in the buffer pool that were very
> hard to quantify before. Also, it makes it easy to measure how well
> your background writer is doing at writing buffers so the clients don't
> have to. Applying this would complete one of my personal goals for the
> 8.3 release, which was having stats to track every type of buffer write.
>
> I split this out because I think it's very useful to have regardless of
> whether the automatic tuning portion is accepted, and I think these
> smaller patches make the review easier. The main thing I would
> recommend someone check is how am_bg_writer is (mis?)used here. I
> spliced some of the debugging-only code from the original patch, and I
> can't tell if the result is a robust enough approach to solving the
> problem of having every client indirectly report their activity to the
> background writer. Other than that, I think this code is ready for
> review and potentially comitting.

This looks good to me in principle. StrategyReportWrite increments
numClientWrites without holding the BufFreeListLock, that's a race
condition. The terminology needs some adjustment; clients don't write
buffers, backends do.

Splitting the patch to two is a good idea.

> The second patch (limit-lru) adds on top of that a constraint of the LRU
> writer so that it doesn't do any more work than it has to. Note that I
> left verbose debugging code in here because I'm much less confident this
> patch is complete.
>
> It predicts upcoming buffer allocations using a 16-period weighted
> moving average of recent activity, which you can think of as the last
> 3.2 seconds at the default interval. After testing a few systems that
> seemed a decent compromise of smoothing in both directions. I found the
> 2X overallocation fudge factor of the original patch way too aggressive,
> and just pick the larger of the most recent allocation amount or the
> smoothed value. The main thing that throws off the allocation
> estimation is when you hit a checkpoint, which can give a big spike
> after the background writer returns to BgBufferSync and notices all the
> buffers that were allocated during the checkpoint write; the code then
> tries to find more buffers it can recycle than it needs to. Since the
> checkpoint itself normally leaves a large wake of reusable buffers
> behind it, I didn't find this to be a serious problem.

Can you tell more about the tests you performed? That algorithm seems
decent, but I wonder why the simple fudge factor wasn't good enough? I
would've thought that a 2x or even bigger fudge factor would still be
only a tiny fraction of shared_buffers, and wouldn't really affect
performance.

The load distributed checkpoint patch should mitigate the checkpoint
spike problem by continuing the LRU scan throughout the checkpoint.

> There's another communication issue here, which is that SyncOneBuffer
> needs to return more information about the buffer than it currently does
> once it gets it locked. The background writer needs to know more than
> just if it was written to tune itself. The original patch used a clever
> trick for this which worked but I found confusing. I happen to have a
> bunch of other background writer tuning code I'm working on, and I had
> to come up with a more robust way to communicate buffer internals back
> via this channel. I used that code here, it's a bitmask setup similar
> to how flags like BM_DIRTY are used. It's overkill for solving this
> particular problem, but I think the interface is clean and it helps
> support future enhancements in intelligent background writing.

Uh, that looks pretty ugly to me. The normal way to return multiple
values is to pass a pointer as an argument, though that can get ugly as
well if there's a lot of return values. What combinations of the flags
are valid? Would an enum be better? Or how about moving the checks for
dirty and pinned buffers from SyncOneBuffer to the callers?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2007-05-13 16:28:53 Re: pg_standby question (solved)
Previous Message Magnus Hagander 2007-05-13 15:37:01 Re: Performance monitoring

Browse pgsql-patches by date

  From Date Subject
Next Message David Fetter 2007-05-13 17:00:42 Re: Concurrent psql patch
Previous Message Magnus Hagander 2007-05-13 16:05:16 Re: Doc update for back-branches, CLUSTER and MVCC-safety

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2007-05-13 18:54:14 Re: Automatic adjustment of bgwriter_lru_maxpages
Previous Message Magnus Hagander 2007-05-13 10:39:04 Re: Kernel cache vs shared_buffers