Re: Page replacement algorithm in buffer cache

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Page replacement algorithm in buffer cache
Date: 2013-04-03 14:00:01
Message-ID: CA+TgmoaP1PtRSkU0=ioi4hRxqCBzrNP9JV1L0YdkBp42PESSzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 2, 2013 at 1:20 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-04-02 12:56:56 -0400, Tom Lane wrote:
>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> > On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
>> >> I agree in general, though I'm not sure the bgwriter process can
>> >> reasonably handle this need along with what it's already supposed to be
>> >> doing. We may need another background process that is just responsible
>> >> for keeping the freelist populated.
>>
>> > What else is the bgwriter actually doing otherwise? Sure, it doesn't put the
>> > pages on the freelist, but otherwise its trying to do the above isn't it?
>>
>> I think it will be problematic to tie buffer-undirtying to putting both
>> clean and dirty buffers into the freelist. It might chance to work all
>> right to use one scan process for both, but I'm afraid it's more likely
>> that we'd end up either overserving one goal or underserving the other.
>
> Hm. I had imagined that we would only ever put clean buffers into the
> freelist and that we would never write out a buffer that we don't need
> for a new page. I don't see much point in randomly writing out buffers
> that won't be needed for something else soon. Currently we can't do much
> better than basically undirtying random buffers since we don't really know
> which page will reach a usagecount of zero since bgwriter doesn't
> manipulate usagecounts.
>
> One other scenario I can see is the problem of strategy buffer reusage
> of dirtied pages (hint bits, pruning) during seqscans where we would
> benefit from pages being written out fast, but I can't imagine that that
> could be handled very well by something like the bgwriter?
>
> Am I missing something?

I've had the same thought. I think we should consider having a
background process that listens on a queue, sort of like the fsync
absorption queue. When a process using a buffer access strategy
dirties a buffer, it adds it to that queue and sets the latch for the
background process, which then wakes up and starts cleaning the
buffers that have been added to its queue. The hope is that, by the
time the ring buffer wraps around, the background process will have
cleaned the buffer, preventing the foreground process from having to
wait for the buffer write (and, perhaps, xlog flush).

The main hesitation I've had about actually implementing such a scheme
is that I find it a bit unappealing to have a background process
dedicated to just this. But maybe it could be combined with some of
the other ideas presented here. Perhaps we should have one process
that scans the buffer arena and populates the freelists; as a side
effect, if it runs across a dirty buffer, it kicks it over to the
process described in the previous paragraph (which could still, also,
absorb requests from other backends using buffer access strategies).
Then we'd end up with nothing that looks exactly like the background
writer we have now, but maybe no one would miss it.

I think that as we go through the process of trying to improve this,
we should also look hard at trying to make the algorithms more
self-tuning. For example, instead of having a fixed delay between
rounds for the buffer-arena-scanning process, I think we should try to
make it adaptive. If it sticks a bunch of buffers on the freelist and
the freelist then runs dry before it wakes up again, the backend that
notices that the list is empty (or below some low watermark), it
should set a latch to wake up the buffer-arena-scanning process; and
the next time that process goes back to sleep, it should sleep for a
shorter period of time. As things are today, what the background
writer actually does is unhelpful enough that there might not be much
point in fiddling with this, but as we get to having a more sensible
scheme overall, I think it will pay dividends.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2013-04-03 14:02:13 Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Previous Message Albe Laurenz 2013-04-03 13:52:17 Typo in FDW documentation