Re: SLRUs in the main buffer pool - Page Header definitions

From: "Bagga, Rishu" <bagrishu(at)amazon(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Debnath, Shawn" <sdn(at)amazon(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: SLRUs in the main buffer pool - Page Header definitions
Date: 2023-07-05 22:05:13
Message-ID: 39D53352-9828-4479-A64D-EED4535AB642@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>I think you'd need to fix those tests, rather than just removing them.
>I suspect we're going to continue to want SLRU specific stats, but
>your patch also seems to remove those.

>This is still messy, but now I can see what the point is: make the
>SLRUs, which are tracked in the main buffer pool thanks to the other
>patches, use the standard page header.

>I'm not sure if I like that or not. I think we should clean up and
>finish the other patches that this builds on first, and then decide if
>we want to use the standard page header for the SLRUs or not. And if
>we decide that we want the SLRU pages to have a page header, clean
>this up or rewrite it from scratch.

>I'm not entirely sure either, but I think the idea has some potential.
>If SLRU pages have headers, that means that they have LSNs, and
>perhaps then we could use those LSNs to figure out when they're safe
>to write to disk, instead of ad-hoc mechanisms. See SlruSharedData's
>group_lsn field.

>I agree that it's got potential and seems like the right direction to
>go in. That would also allow for checksums for SLRUs and possibly
>support for encryption which leverages the LSN and for a dynamic page
>feature area which could allow for an extended checksum or perhaps
>authenticated encryption with additonal authenticated data.

Hi all,

Thank you for the feedback on the last patch. I have prepared a new set
of patches here that aim to address these concerns; cleaning the patch
up, and breaking them into smaller parts. Each functional change is
broken into its own patch. To keep the change as minimal as possible, I
have removed the tangential changes from Heikki Linnakangas’ patch
modifying the storage manager, and kept the changes limited to moving
SLRU components to buffer cache, and page header changes.

The first patch is the original patch that moves some of the SLRUs to
the buffercache, which Thomas Munro posted in [1], rebased to the latest
commit.

The second patch is a patch which fixes problems with trying to allocate
memory in critical sections in the commit log, and multixacts in Munro’s
patch. In Munro’s patch - there are three places where we may need to
allocate memory in a Critical Section, that I have addressed.

1. When recording a transaction status, we may need to allocate memory
for a buffer pin to bring the clog page into the buffer cache. I added a
call to ResourceOwnerEnlargeBuffers before entering the critical section
to resolve this issue.

2. When recording a transaction status, we may need to allocate memory
for an entry in the storage manager hash table for the commit log in
smgropen. I added a function “VerifyClogLocatorInHashTable” which forces
an smgropen call that does this if needed. This function is called
before entering the Critical Section.

3. When recording a multixact, we enter a critical section while writing
the multixact members. Now that the multlixact pages are stored in the
buffer cache, we may need to allocate memory here to retrieve a buffer
page. I modified GetNewMultiXactId to also prefetch all the buffers we
will need before we enter critical section, so that we do not need to
make ReadBuffer calls while in the multixact critical section.

The third patch brings back the the SLRU control structure, to keep it
as an extensible feature for now, and renames the handler for the
components we are moving into the buffercache to NREL (short for non
relational). nrel.c is essentially a copy of Munro’s modified slru.c,
and I have restored the original slru.c. This allows for existing
extensions utilizing SLRUs to keep working, and the test_slru unit tests
to pass, as well as introducing a more accurate name for the handling of
components (CLOG, Multixact Offsets/Members, Async, Serial, Subtrans)
that are no longer handled by an SLRU, but are still non relational
components. To address Andres’s concern - I modified the slru stats test
code to still track all these current components and maintain the
behavior, and confirmed as those tests pass as well.

The fourth patch adds the page headers to these Non Relational (NREL)
components, and provides the upgrade story to rewrite the clog and
multixact files with page headers across upgrades.

With the changes from all four patches, they pass all tests with make
installcheck-world, as well as test_slru.

I hope these patches are easier to read and review, and would appreciate
any feedback.

[1] https://www.postgresql.org/message-id/flat/CA+hUKGKAYze99B-jk9NoMp-2BDqAgiRC4oJv+bFxghNgdieq8Q(at)mail(dot)gmail(dot)com

Rishu Bagga,

Amazon Web Services (AWS)

Attachment Content-Type Size
0001-slru_to_buf_munro_rebase.patch application/octet-stream 169.3 KB
0002-slru_to_buf_crit_section_fix.patch application/octet-stream 16.6 KB
0003-slru_to_buf_keep_slru_use_nrels.patch application/octet-stream 132.5 KB
0004-slru_to_buf_page_headers_with_upgrade.patch application/octet-stream 46.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-07-05 22:07:17 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Daniel Gustafsson 2023-07-05 22:00:41 Re: pgsql: Clean up role created in new subscription test.