Re: Add a new pg_walinspect function to extract FPIs from WAL records

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add a new pg_walinspect function to extract FPIs from WAL records
Date: 2023-01-10 08:29:03
Message-ID: dced46f2-1f10-7ac7-2f5c-f11be5d49990@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 1/6/23 6:41 PM, Bharath Rupireddy wrote:
> On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>>
>> On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
>> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>>>
>>>> I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
>>>> extracted images are the same and matches the one modified right after the checkpoint.
>>>>
>>>> What do you think? (could be done later in another patch though).
>>>
>>> I think pageinspect can be used here. We can fetch the raw page from
>>> the table after the checkpoint and raw FPI from the WAL record logged
>>> as part of the update. I've tried to do so [1], but I see a slight
>>> difference in the raw output. The expectation is that they both be the
>>> same. It might be that the update operation logs the FPI with some
>>> more info set (prune_xid). I'll try to see why it is so.
>>>
>>> I'm attaching the v2 patch for further review.
>>>
>>> [1]
>>> SELECT * FROM page_header(:'page_from_table');
>>> lsn | checksum | flags | lower | upper | special | pagesize |
>>> version | prune_xid
>>> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>>> 0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 |
>>> 4 | 0
>>> (1 row)
>>>
>>> SELECT * FROM page_header(:'page_from_wal');
>>> lsn | checksum | flags | lower | upper | special | pagesize |
>>> version | prune_xid
>>> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>>> 0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 |
>>> 4 | 735
>>> (1 row)
>>
>> Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
>> further review. Sorry for the noise.
>
> I took a stab at how and what gets logged as FPI in WAL records:
>
> Option 1:
> WAL record with FPI contains both the unmodified table page from the
> disk after checkpoint and new tuple (not applied to the unmodified
> page) and the recovery (redo) applies the new tuple to the unmodified
> page as part of recovery. A bit more WAL is needed to store both
> unmodified page and new tuple data in the WAL record and recovery can
> get slower a bit too as it needs to stitch the modified page.
>
> Option 2:
> WAL record with FPI contains only the modified page (new tuple applied
> to the unmodified page from the disk after checkpoint) and the
> recovery (redo) just returns the applied block as BLK_RESTORED.
> Recovery can get faster with this approach and less WAL is needed to
> store just the modified page.
>
> My earlier understanding was that postgres does option (1), however, I
> was wrong, option (2) is what actually postgres has implemented for
> the obvious advantages specified.
>
> I now made the tests a bit stricter in checking the FPI contents
> (tuple values) pulled from the WAL record with raw page contents
> pulled from the table using the pageinspect extension. Please see the
> attached v4 patch.
>

Thanks for updating the patch!

+-- Compare FPI from WAL record and page from table, they must be same

I think "must be the same" or "must be identical" sounds better (but not 100% sure).

Except this nit, V4 looks good to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ankit Kumar Pandey 2023-01-10 08:31:52 Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Previous Message Amit Kapila 2023-01-10 08:25:56 Re: typos