Re: Add a new pg_walinspect function to extract FPIs from WAL records

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add a new pg_walinspect function to extract FPIs from WAL records
Date: 2023-01-06 17:41:26
Message-ID: CALj2ACXesN9DTjgsekM8fig7CxhhxQfQP4fCiSJgcmp9wrZOvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > > I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
> > > extracted images are the same and matches the one modified right after the checkpoint.
> > >
> > > What do you think? (could be done later in another patch though).
> >
> > I think pageinspect can be used here. We can fetch the raw page from
> > the table after the checkpoint and raw FPI from the WAL record logged
> > as part of the update. I've tried to do so [1], but I see a slight
> > difference in the raw output. The expectation is that they both be the
> > same. It might be that the update operation logs the FPI with some
> > more info set (prune_xid). I'll try to see why it is so.
> >
> > I'm attaching the v2 patch for further review.
> >
> > [1]
> > SELECT * FROM page_header(:'page_from_table');
> > lsn | checksum | flags | lower | upper | special | pagesize |
> > version | prune_xid
> > -----------+----------+-------+-------+-------+---------+----------+---------+-----------
> > 0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 |
> > 4 | 0
> > (1 row)
> >
> > SELECT * FROM page_header(:'page_from_wal');
> > lsn | checksum | flags | lower | upper | special | pagesize |
> > version | prune_xid
> > -----------+----------+-------+-------+-------+---------+----------+---------+-----------
> > 0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 |
> > 4 | 735
> > (1 row)
>
> Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
> further review. Sorry for the noise.

I took a stab at how and what gets logged as FPI in WAL records:

Option 1:
WAL record with FPI contains both the unmodified table page from the
disk after checkpoint and new tuple (not applied to the unmodified
page) and the recovery (redo) applies the new tuple to the unmodified
page as part of recovery. A bit more WAL is needed to store both
unmodified page and new tuple data in the WAL record and recovery can
get slower a bit too as it needs to stitch the modified page.

Option 2:
WAL record with FPI contains only the modified page (new tuple applied
to the unmodified page from the disk after checkpoint) and the
recovery (redo) just returns the applied block as BLK_RESTORED.
Recovery can get faster with this approach and less WAL is needed to
store just the modified page.

My earlier understanding was that postgres does option (1), however, I
was wrong, option (2) is what actually postgres has implemented for
the obvious advantages specified.

I now made the tests a bit stricter in checking the FPI contents
(tuple values) pulled from the WAL record with raw page contents
pulled from the table using the pageinspect extension. Please see the
attached v4 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v4-0001-Add-FPI-extract-function-to-pg_walinspect.patch application/x-patch 20.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-01-06 17:53:07 Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
Previous Message Tom Lane 2023-01-06 17:32:52 Re: ATTACH PARTITION seems to ignore column generation status