Re: Readme of Buffer Management seems to have wrong sentence

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, 'Robert Haas' <robertmhaas(at)gmail(dot)com>
Cc: 'PostgreSQL-development' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Readme of Buffer Management seems to have wrong sentence
Date: 2012-05-23 15:36:35
Message-ID: 005701cd38f9$d6bcc500$84364f00$%kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>And besides
>>if the decrements are decoupled from the allocation requests it's no
>>longer obvious that the algorithm is even an approximation of LRU.

I was trying to highlight that we can do the clocksweep in bgwriter and keep
the backends logic as it is currently.
The core idea is that it will reduce the work of backends and chances of
them to get the free buffer early than currently will be more.

Some of the other ideas about it which I have discussed are
1. moving clean buffers to freelist when any found by bgwriter or
checkpoint. This is to get the buffers early by backends without going into
clock sweep algorithm.

2. having multiple lists of hot and cold buffers and backends will find the
buffers in following order if the required buffer is not already there
a. freelist
b. cold list
c. hot list

3. try to experiment with different values of usage count under heavy load
scenarios and check does it make any difference.

I will try to prototype these ideas and publish the results here, so that we
can check if it gives any benfit.
There are some other ideas also in this chain list which I shall check like
1. (clock
sweeps from the different backends ended up too closely synchronized). If
these really cause problems I will try to address.
2. Not to have same lock for all the algorithm for finding a free buffer.

I would like to build a prototype as follows:
1. prepare or identify an existing testcase where these problems would be
more evident
2. Change the code to implement the ideas in this mail chain and what I
think can improve.

Do you feel I can attempt to address this problem with some prototypes and
discuss here after few days when I have some results ready.

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Tuesday, May 22, 2012 7:55 PM
To: Robert Haas
Cc: Amit Kapila; PostgreSQL-development
Subject: Re: [HACKERS] Readme of Buffer Management seems to have wrong
sentence

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, May 22, 2012 at 10:01 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, keep in mind that that action is not merely there to obtain a
>> victim buffer; it is also maintaining the global LRU state (by
>> decrementing the usage counts of buffers it passes over).  I don't think
>> you can change it to simply look only at a predetermined freelist
>> without seriously compromising the overall quality of our buffer
>> replacement decisions.

> The idea would be to have a background process (like bgwriter)
> maintain the global LRU state and push candidate buffers onto the
> freelist.

Amit was trying to convince me of the same idea at PGCon, but I don't
buy it. bgwriter doesn't scan the buffer array nearly fast enough to
provide useful adjustment of the usage counts under load. And besides
if the decrements are decoupled from the allocation requests it's no
longer obvious that the algorithm is even an approximation of LRU.

But the larger issue here is that if that processing is a bottleneck
(which I agree it is), how does it help to force a single process to
be responsible for it? Any real improvement in scalability here will
need to decentralize the operation more, not less.

My own thoughts about this had pointed in the direction of getting rid
of the central freelist entirely, instead letting each backend run its
own independent clock sweep as needed. The main problem with that is
that if there's no longer any globally-visible clock sweep state, it's
pretty hard to figure out what the control logic for the bgwriter should
look like. Maybe it would be all right to have global variables that
are just statistics counters for allocations and buffers swept over,
which backends would need to spinlock for just long enough to increment
the counters at the end of each buffer allocation.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2012-05-23 15:47:44 Re: Why is indexonlyscan so darned slow?
Previous Message Tom Lane 2012-05-23 15:09:42 Re: [RFC] Interface of Row Level Security