Re: Proposal to add page headers to SLRU pages

From: "Bagga, Rishu" <bagrishu(at)amazon(dot)com>
To: "Li, Yong" <yoli(at)ebay(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Debnath, Shawn" <sdn(at)ebay(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, "Shyrabokau, Anton" <antons(at)ebay(dot)com>
Subject: Re: Proposal to add page headers to SLRU pages
Date: 2023-12-19 02:23:24
Message-ID: 1404E248-F2B2-48BC-9B5E-BB318F3BE583@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 8, 2023 at 1:36 AM Li, Yong <yoli(at)ebay(dot)com> wrote:

>Given so many different approaches were discussed, I have started a
>wiki to record and collaborate all efforts towards SLRU
>improvements. The wiki provides a concise overview of all the ideas
>discussed and can serve as a portal for all historical
>discussions. Currently, the wiki summarizes four recent threads
>ranging from identifier format change to page header change, to moving
>SLRU into the main buffer pool, to reduce lock contention on SLRU
>latches. We can keep the patch related discussions in this thread and
>use the wiki as a live document for larger scale collaborations.

>The wiki page is
>here: https://wiki.postgresql.org/wiki/SLRU_improvements

>Regarding the benefits of this patch, here is a detailed explanation:

1. Checksum is added to each page, allowing us to verify if a page has
been corrupted when read from the disk.
2. The ad-hoc LSN group structure is removed from the SLRU cache
control data and is replaced by the page LSN in the page header.
This allows us to use the same WAL protocol as used by pages in the
main buffer pool: flush all redo logs up to the page LSN before
flushing the page itself. If we move SLRU caches into the main
buffer pool, this change fits naturally.
3. It leaves further optimizations open. We can continue to pursue the
goal of moving SLRU into the main buffer pool, or we can follow the
lock partition idea. This change by itself does not conflict with
either proposal.

>Also, the patch is now complete and is ready for review. All check-
>world tests including tap tests passed with this patch.

Hi Yong,

I agree we should break the effort for the SLRU optimization into
smaller chunks after having worked on some of the bigger patches and
facing difficulty in making progress that way.

The patch looks mostly good to me; though one thing that I thought about
differently with the upgrade portion is where we should keep the logic
of re-writing the CLOG files.

There is a precedent introduced back in Postgres v9.6 in making on disk
page format changes across different in visibility map: [1]

code comment:
* In versions of PostgreSQL prior to catversion 201603011, PostgreSQL's
* visibility map included one bit per heap page; it now includes two.
* When upgrading a cluster from before that time to a current PostgreSQL
* version, we could refuse to copy visibility maps from the old cluster
* to the new cluster; the next VACUUM would recreate them, but at the
* price of scanning the entire table. So, instead, we rewrite the old
* visibility maps in the new format.

This work is being done in file.c – it seems to me the proper way to
proceed would be to continue writing on-disk upgrade logic here.

Besides that this looks good to me, would like to hear what others have to say.

Thanks,

Rishu Bagga

Amazon Web Services (AWS)

[1] https://github.com/postgres/postgres/commit/7087166a88fe0c04fc6636d0d6d6bea1737fc1fb

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-12-19 02:31:54 Re: add non-option reordering to in-tree getopt_long
Previous Message Peter Smith 2023-12-19 01:27:34 Re: Synchronizing slots from primary to standby