Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Date: 2023-11-09 16:08:51
Message-ID: CA+TgmoaYPeo+NB__teHfAuJ2K4zs8izTggXDATe35yn6fq_LmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 8, 2023 at 6:41 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> Here is the updated version of the patch, here I have taken the
> approach suggested by Andrey and I discussed the same with Alvaro
> offlist and he also agrees with it. So the idea is that we will keep
> the bank size fixed which is 16 buffers per bank and the allowed GUC
> value for each slru buffer must be in multiple of the bank size. We
> have removed the centralized lock but instead of one lock per bank, we
> have kept the maximum limit on the number of bank locks which is 128.
> We kept the max limit as 128 because, in one of the operations (i.e.
> ActivateCommitTs), we need to acquire all the bank locks (but this is
> not a performance path at all) and at a time we can acquire a max of
> 200 LWlocks, so we think this limit of 128 is good. So now if the
> number of banks is <= 128 then we will be using one lock per bank
> otherwise the one lock may protect access of buffer in multiple banks.

Just so I understand, I guess this means that an SLRU is limited to
16*128 = 2k buffers = 16MB?

When we were talking about this earlier, I suggested fixing the number
of banks and allowing the number of buffers per bank to scale
depending on the setting. That seemed simpler than allowing both the
number of banks and the number of buffers to vary, and it might allow
the compiler to optimize some code better, by converting a calculation
like page_no%number_of_banks into a masking operation like page_no&0xf
or whatever. However, because it allows an individual bank to become
arbitrarily large, it more or less requires us to use a buffer mapping
table. Some of the performance problems mentioned could be alleviated
by omitting the hash table when the number of buffers per bank is
small, and we could also create the dynahash with a custom hash
function that just does modular arithmetic on the page number rather
than a real hashing operation. However, maybe we don't really need to
do any of that. I agree that dynahash is clunky on a good day. I
hadn't realized the impact would be so noticeable.

This proposal takes the opposite approach of fixing the number of
buffers per bank, letting the number of banks vary. I think that's
probably fine, although it does reduce the effective associativity of
the cache. If there are more hot buffers in a bank than the bank size,
the bank will be contended, even if other banks are cold. However,
given the way SLRUs are accessed, it seems hard to imagine this being
a real problem in practice. There aren't likely to be say 20 hot
buffers that just so happen to all be separated from one another by a
number of pages that is a multiple of the configured number of banks.
And in the seemingly very unlikely event that you have a workload that
behaves like that, you could always adjust the number of banks up or
down by one, and the problem would go away. So this seems OK to me.

I also agree with a couple of points that Alvaro made, specifically
that (1) this doesn't have to be perfect, just better than now and (2)
separate GUCs for each SLRU is fine. On the latter point, it's worth
keeping in mind that the cost of a GUC that most people don't need to
tune is fairly low. GUCs like work_mem and shared_buffers are
"expensive" because everybody more or less needs to understand what
they are and how to set them and getting the right value can tricky --
but a GUC like autovacuum_naptime is a lot cheaper, because almost
nobody needs to change it. It seems to me that these GUCs will fall
into the latter category. Users can hopefully just ignore them except
if they see a contention on the SLRU bank locks -- and then they can
consider increasing the number of banks for that particular SLRU. That
seems simple enough. As with autovacuum_naptime, there is a danger
that people will configure a ridiculous value of the parameter for no
good reason and get bad results, so it would be nice if someday we had
a magical system that just got all of this right without the user
needing to configure anything. But in the meantime, it's better to
have a somewhat manual system to relieve pressure on these locks than
no system at all.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2023-11-09 16:18:21 Re: Speed up transaction completion faster after many relations are accessed in a transaction
Previous Message Dean Rasheed 2023-11-09 15:59:20 Re: Bug: RLS policy FOR SELECT is used to check new rows