Re: [Testperf-general] BufferSync and bgwriter

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Mark Wong <markw(at)osdl(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, testperf-general(at)pgfoundry(dot)org
Subject: Re: [Testperf-general] BufferSync and bgwriter
Date: 2004-12-15 16:16:26
Message-ID: 41C0635A.50801@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/12/2004 9:43 PM, Neil Conway wrote:

> On Sun, 2004-12-12 at 22:08 +0000, Simon Riggs wrote:
>> > On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
>> > Is the plan to make bgwriter_percent = 100 the default setting?
>>
>> Hmm...must confess that my only plan is:
>> i) discover dynamic behaviour of bgwriter
>> ii) fix any bugs or wierdness as quickly as possible
>> iii) try to find a way to set the bgwriter defaults
>
> I was just curious why you were bothering to special-case
> bgwriter_percent = 100 if it's not going to be the default setting (in
> which case I would be surprised if more than 1 in 10 users would take
> advantage of the patch).
>
>> Right now, bgwriter_delay
>> is useless because the O(N) behaviour makes it impossible to set any
>> lower when you have a large shared_buffers.
>
> BTW, I wouldn't be _too_ worried about O(N) behavior, except that we do
> this scan while holding the BufMgrLock, which is a well known source of
> contention. So reducing the time we hold that lock would be good.
>
>> Your question has made me rethink the exact objective of the bgwriter's
>> actions: The way it is coded now the bgwriter looks for dirty blocks, no
>> matter where they are in the list.
>
> Not sure what you mean. StrategyDirtyBufferList() returns the specified
> number of dirty buffers in order, starting with the T1/T2 LRUs and going
> back to the MRUs of both lists. bgwriter_percent effectively ignores
> some portion of the tail of that list, so we end up just flushing the
> buffers closest to the L1/L2 LRUs. How is this different from what
> you're describing?
>
>> bgwriter_percent would be the % of shared_buffers that are searched
>> (from the LRU end) to see if they contain dirty buffers, which are
>> then written to disk.
>
> By definition, buffers closest to the LRU end of the lists are not
> frequently accessed. If we only search the N% of the lists closest to
> LRU, we will probably end up flushing just those pages to disk -- and
> then not flushing anything else to disk in the subsequent bgwriter calls
> because all the buffers close to the LRU will be non-dirty. That's okay
> if all we're concerned about is avoiding write() by a real backend, but
> we also want to smooth out checkpoint load, which I don't think this
> approach would do well.
>
> I suggest just getting rid of bgwriter_percent: AFAICS bgwriter_maxpages
> is all the tuning we need, and I think "max # of pages to write" is a
> simpler and more logical tuning knob than "% of the buffer pool to scan
> looking for dirty buffers." So at each bufmgr invocation, we pick the at
> most bgwriter_maxpages dirty pages from the pool, using the pages
> closest to the LRUs of T1 and T2. I'd be happy to supply a patch to
> implement that if you think it sounds okay.

I too don't think that this approach will retain the checkpoing smooting
effect, the current implementation has.

The real problem is that the "cleaner" the buffer pool is, the longer
the scan for dirty buffers will take because the dirty blocks tend to be
at the very end of the scan order. The real solution for this would be
not to scan the whole pool, but to maintain a separate chain of only
dirty buffers in LRU order.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2004-12-15 16:56:19 Re: production server down
Previous Message Jan Wieck 2004-12-15 16:10:22 Re: [Testperf-general] BufferSync and bgwriter