Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Date: 2023-10-25 05:04:15
Message-ID: CAFiTN-sC+FP3qbtAFZ+NxF=fSw+EJKQZHe5hUCG8S+7FMBUOVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 24, 2023 at 9:34 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2023-Oct-11, Dilip Kumar wrote:
>
> > In my last email, I forgot to give the link from where I have taken
> > the base path for dividing the buffer pool in banks so giving the same
> > here[1]. And looking at this again it seems that the idea of that
> > patch was from Andrey M. Borodin and the idea of the SLRU scale factor
> > were introduced by Yura Sokolov and Ivan Lazarev. Apologies for
> > missing that in the first email.
>
> You mean [1].
> [1] https://postgr.es/m/452d01f7e331458f56ad79bef537c31b%40postgrespro.ru
> I don't like this idea very much, because of the magic numbers that act
> as ratios for numbers of buffers on each SLRU compared to other SLRUs.
> These values, which I took from the documentation part of the patch,
> appear to have been selected by throwing darts at the wall:
>
> NUM_CLOG_BUFFERS = Min(128 << slru_buffers_size_scale, shared_buffers/256)
> NUM_COMMIT_TS_BUFFERS = Min(128 << slru_buffers_size_scale, shared_buffers/256)
> NUM_SUBTRANS_BUFFERS = Min(64 << slru_buffers_size_scale, shared_buffers/256)
> NUM_NOTIFY_BUFFERS = Min(32 << slru_buffers_size_scale, shared_buffers/256)
> NUM_SERIAL_BUFFERS = Min(32 << slru_buffers_size_scale, shared_buffers/256)
> NUM_MULTIXACTOFFSET_BUFFERS = Min(32 << slru_buffers_size_scale, shared_buffers/256)
> NUM_MULTIXACTMEMBER_BUFFERS = Min(64 << slru_buffers_size_scale, shared_buffers/256)
>
> ... which look pretty random already, if similar enough to the current
> hardcoded values. In reality, the code implements different values than
> what the documentation says.
>
> I don't see why would CLOG have the same number as COMMIT_TS, when the
> size for elements of the latter is like 32 times bigger -- however, the
> frequency of reads for COMMIT_TS is like 1000x smaller than for CLOG.
> SUBTRANS is half of CLOG, yet it is 16 times larger, and it covers the
> same range. The MULTIXACT ones appear to keep the current ratio among
> them (8/16 gets changed to 32/64).
>
> ... and this whole mess is scaled exponentially without regard to the
> size that each SLRU requires. This is just betting that enough memory
> can be wasted across all SLRUs up to the point where the one that is
> actually contended has sufficient memory. This doesn't sound sensible
> to me.
>
> Like everybody else, I like having less GUCs to configure, but going
> this far to avoid them looks rather disastrous to me. IMO we should
> just use Munro's older patches that gave one GUC per SLRU, and users
> only need to increase the one that shows up in pg_wait_event sampling.
> Someday we will get the (much more complicated) patches to move these
> buffers to steal memory from shared buffers, and that'll hopefully let
> use get rid of all this complexity.

Overall I agree with your comments, actually, I haven't put that much
thought into the GUC part and how it scales the SLRU buffers w.r.t.
this single configurable parameter. Yeah, so I think it is better
that we take the older patch version as our base patch where we have
separate GUC per SLRU.

> I'm inclined to use Borodin's patch last posted here [2] instead of your
> proposed 0001.
> [2] https://postgr.es/m/93236D36-B91C-4DFA-AF03-99C083840378@yandex-team.ru

I will rebase my patches on top of this.

> I did skim patches 0002 and 0003 without going into too much detail;
> they look reasonable ideas. I have not tried to reproduce the claimed
> performance benefits. I think measuring this patch set with the tests
> posted by Shawn Debnath in [3] is important, too.
> [3] https://postgr.es/m/YemDdpMrsoJFQJnU@f01898859afd.ant.amazon.com

Thanks for taking a look.

>
> On the other hand, here's a somewhat crazy idea. What if, instead of
> stealing buffers from shared_buffers (which causes a lot of complexity),

Currently, we do not steal buffers from shared_buffers, computation is
dependent upon Nbuffers though. I mean for each SLRU we are computing
separate memory which is additional than the shared_buffers no?

> we allocate a common pool for all SLRUs to use? We provide a single
> knob -- say, non_relational_buffers=32MB as default -- and we use a LRU
> algorithm (or something) to distribute that memory across all the SLRUs.
> So the ratio to use for this SLRU or that one would depend on the nature
> of the workload: maybe more for multixact in this server here, but more
> for subtrans in that server there; it's just the total amount that the
> user would have to configure, side by side with shared_buffers (and
> perhaps scale with it like wal_buffers), and the LRU would handle the
> rest. The "only" problem here is finding a distribution algorithm that
> doesn't further degrade performance, of course ...

Yeah, this could be an idea, but are you talking about that all the
SLRUs will share the single buffer pool and based on the LRU algorithm
it will be decided which page will stay in the buffer pool and which
will be out? But wouldn't that create another issue of different
SLRUs starting to contend on the same lock if we have a common buffer
pool for all the SLRUs? Or am I missing something? Or you are saying
that although there is a common buffer pool each SLRU will have its
own boundaries in it so protected by a separate lock and based on the
workload those boundaries can change dynamically? I haven't put much
thought into how practical the idea is but just trying to understand
what you have in mind.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2023-10-25 05:12:58 Re: [dynahash] do not refill the hashkey after hash_search
Previous Message Michael Paquier 2023-10-25 04:57:18 Re: Adding facility for injection points (or probe points?) for more advanced tests