Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, tender wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Date: 2024-02-22 18:43:00
Message-ID: 202402221843.ibzvpndbacbi@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024-Feb-07, Dilip Kumar wrote:

> On Wed, Feb 7, 2024 at 3:49 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> > Sure, but is that really what we want?
>
> So your question is do we want these buffers to be in multiple of
> SLRU_BANK_SIZE? Maybe we can have the last bank to be partial, I
> don't think it should create any problem logically. I mean we can
> look again in the patch to see if we have made any such assumptions
> but that should be fairly easy to fix, then maybe if we are going in
> this way we should get rid of the check_slru_buffers() function as
> well.

Not really, I just don't think the macro should be in slru.h.

Another thing I've been thinking is that perhaps it would be useful to
make the banks smaller, when the total number of buffers is small. For
example, if you have 16 or 32 buffers, it's not really clear to me that
it makes sense to have just 1 bank or 2 banks. It might be more
sensible to have 4 banks with 4 or 8 buffers instead. That should make
the algorithm scale down as well as up ...

I haven't done either of those things in the attached v19 version. I
did go over the comments once again and rewrote the parts I was unhappy
with, including some existing ones. I think it's OK now from that point
of view ... at some point I thought about creating a separate README,
but in the end I thought it not necessary.

I did add a bunch of Assert()s to make sure the locks that are supposed
to be held are actually held. This led me to testing the page status to
be not EMPTY during SimpleLruWriteAll() before calling
SlruInternalWritePage(), because the assert was firing. The previous
code is not really *buggy*, but to me it's weird to call WritePage() on
a slot with no contents.

Another change was in TransactionGroupUpdateXidStatus: the original code
had the leader doing pg_atomic_read_u32(&procglobal->clogGroupFirst) to
know which bank to lock. I changed it to simply be the page used by the
leader process; this doesn't need an atomic read, and should be the same
page anyway. (If it isn't, it's no big deal). But what's more: even if
we do read ->clogGroupFirst at that point, there's no guarantee that
this is going to be exactly for the same process that ends up being the
first in the list, because since we have not set it to INVALID by the
time we grab the bank lock, it is quite possible for more processes to
add themselves to the list.

I realized all this while rewriting the comments in a way that would let
me understand what was going on ... so IMO the effort was worthwhile.

Anyway, what I send now should be pretty much final, modulo the
change to the check_slru_buffers routines and documentation additions to
match pg_stat_slru to the new GUC names.

> > Another painful point is that pg_stat_slru uses internal names in the
> > data it outputs, which obviously do not match the new GUCs.
>
> Yeah, that's true, but I think this could be explained somewhere not
> sure what is the right place for this.

Or we can change those names in the view.

> FYI, I have also repeated all the performance tests I performed in my
> first email[1], and I am seeing a similar gain.

Excellent, thanks for doing that.

> Some comments on v18 in my first pass of the review.
>
> 1.
> @@ -665,7 +765,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
> lsnindex = GetLSNIndex(slotno, xid);
> *lsn = XactCtl->shared->group_lsn[lsnindex];
>
> - LWLockRelease(XactSLRULock);
> + LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
>
> Maybe here we can add an assert before releasing the lock for a safety check
>
> Assert(LWLockHeldByMe(SimpleLruGetBankLock(XactCtl, pageno)));

Hmm, I think this would just throw a warning or error "you don't hold
such lwlock", so it doesn't seem necessary.

> 2.
> + *
> + * XXX could we make the LSNs to be bank-based?
> */
> XLogRecPtr *group_lsn;
>
> IMHO, the flush still happens at the page level so up to which LSN
> should be flush before flushing the particular clog page should also
> be maintained at the page level.

Yeah, this was a misguided thought, nevermind.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
<Schwern> It does it in a really, really complicated way
<crab> why does it need to be complicated?
<Schwern> Because it's MakeMaker.

Attachment Content-Type Size
v19-0001-Make-SLRU-buffer-sizes-configurable.patch text/x-diff 117.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2024-02-22 18:50:03 Re: Optimize planner memory consumption for huge arrays
Previous Message Dima Rybakov (Tlt) 2024-02-22 18:22:03 how to read table options during smgropen()