Re: Page replacement algorithm in buffer cache

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Page replacement algorithm in buffer cache
Date: 2013-04-02 10:32:39
Message-ID: 20130402103239.GC2415@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-04-01 17:56:19 -0500, Jim Nasby wrote:
> On 3/23/13 7:41 AM, Ants Aasma wrote:
> >Yes, having bgwriter do the actual cleaning up seems like a good idea.
> >The whole bgwriter infrastructure will need some serious tuning. There
> >are many things that could be shifted to background if we knew it
> >could keep up, like hint bit setting on dirty buffers being flushed
> >out. But again, we have the issue of having good tests to see where
> >the changes hurt.
>
> I think at some point we need to stop depending on just bgwriter for all this stuff. I believe it would be much cleaner if we had separate procs for everything we needed (although some synergies might exist; if we wanted to set hint bits during write then bgwriter *is* the logical place to put that).
>
> In this case, I don't think keeping stuff on the free list is close enough to checkpoints that we'd want bgwriter to handle both. At most we might want them to pass some metrics back in forth.

bgwriter isn't doing checkpoints anymore, there's the checkpointer since 9.2.

In my personal experience and measurement bgwriter is pretty close to
useless right now. I think - pretty similar to what Amit has done - it
should perform part of a real clock sweep instead of just looking ahead
of the current position without changing usagecounts and the sweep
position and put enough buffers on the freelist to sustain the need till
its next activity phase. I hacked around that one night in a hotel and
got impressive speedups (and quite some breakage) for bigger than s_b
workloads.

That would reduce quite a bit of pain points:
- fewer different processes/cpus looking at buffer headers ahead in the cycle
- fewer cpus changing usagecounts
- dirty pages are far more likely to be flushed out already when a new
page is needed
- stuff like the relation extension lock which right now frequently have
to search far and wide for new pages while holding the extension lock
exlusively should finish quite a bit faster

If the freelist lock is separated from the lock protecting the clock
sweep this should get quite a bit of a scalability boost without having
potential unfairness you can have with partitioning the lock or such.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-04-02 13:48:39 Re: citext like searches using index
Previous Message Andres Freund 2013-04-02 09:45:51 Re: regression test failed when enabling checksum