Re: Page replacement algorithm in buffer cache

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Ants Aasma <ants(at)cybertec(dot)at>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Page replacement algorithm in buffer cache
Date: 2013-04-02 14:52:40
Message-ID: CA+TgmoZRfNsPBpEJUdy5jjvwX6XWcc4_ZRSZmTGQUCXvy9_AcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 2, 2013 at 6:32 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-04-01 17:56:19 -0500, Jim Nasby wrote:
>> On 3/23/13 7:41 AM, Ants Aasma wrote:
>> >Yes, having bgwriter do the actual cleaning up seems like a good idea.
>> >The whole bgwriter infrastructure will need some serious tuning. There
>> >are many things that could be shifted to background if we knew it
>> >could keep up, like hint bit setting on dirty buffers being flushed
>> >out. But again, we have the issue of having good tests to see where
>> >the changes hurt.
>>
>> I think at some point we need to stop depending on just bgwriter for all this stuff. I believe it would be much cleaner if we had separate procs for everything we needed (although some synergies might exist; if we wanted to set hint bits during write then bgwriter *is* the logical place to put that).
>>
>> In this case, I don't think keeping stuff on the free list is close enough to checkpoints that we'd want bgwriter to handle both. At most we might want them to pass some metrics back in forth.
>
> bgwriter isn't doing checkpoints anymore, there's the checkpointer since 9.2.
>
> In my personal experience and measurement bgwriter is pretty close to
> useless right now. I think - pretty similar to what Amit has done - it
> should perform part of a real clock sweep instead of just looking ahead
> of the current position without changing usagecounts and the sweep
> position and put enough buffers on the freelist to sustain the need till
> its next activity phase. I hacked around that one night in a hotel and
> got impressive speedups (and quite some breakage) for bigger than s_b
> workloads.
>
> That would reduce quite a bit of pain points:
> - fewer different processes/cpus looking at buffer headers ahead in the cycle
> - fewer cpus changing usagecounts
> - dirty pages are far more likely to be flushed out already when a new
> page is needed
> - stuff like the relation extension lock which right now frequently have
> to search far and wide for new pages while holding the extension lock
> exlusively should finish quite a bit faster
>
> If the freelist lock is separated from the lock protecting the clock
> sweep this should get quite a bit of a scalability boost without having
> potential unfairness you can have with partitioning the lock or such.

I agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-04-02 14:55:59 Re: Page replacement algorithm in buffer cache
Previous Message Peter Eisentraut 2013-04-02 14:49:11 Re: citext like searches using index