Re: Minimal logical decoding on standbys

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2022-12-21 09:06:42
Message-ID: 1f2ad9fd-53ff-74cc-a602-72d45423bcab@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 12/20/22 10:41 PM, Robert Haas wrote:
> On Tue, Dec 20, 2022 at 3:39 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I think this might be the only WAL record type where there's a
>> problem, but I haven't fully confirmed that yet.
>
> It's not. GIST has the same issue. The same test case demonstrates the
> problem there, if you substitute this test script for
> kpt_hash_setup.sql and possibly also run it for somewhat longer. One
> might think that this wouldn't be a problem, because the comments for
> gistxlogDelete say this:
>
> /*
> * In payload of blk 0 : todelete OffsetNumbers
> */
>
> But it's not in the payload of blk 0. It follows the main payload.

Oh right, nice catch!

Indeed, we can see in gistRedoDeleteRecord():

"
todelete = (OffsetNumber *) ((char *) xldata + SizeOfGistxlogDelete);
"

>
> This is the reverse of xl_heap_freeze_page, which claims that freeze
> plans and offset numbers follow, but they don't: they're in the data
> for block 0.

oh right, we can see in heap_xlog_freeze_page():

"
plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
offsets = (OffsetNumber *) ((char *) plans +
(xlrec->nplans *
sizeof(xl_heap_freeze_plan)));
"

> xl_btree_delete is also wrong, claiming that the data
> follows when it's really attached to block 0.

oh right, we can see in btree_xlog_delete():

"
char *ptr = XLogRecGetBlockData(record, 0, NULL);

page = (Page) BufferGetPage(buffer);

if (xlrec->nupdated > 0)
{
OffsetNumber *updatedoffsets;
xl_btree_update *updates;

updatedoffsets = (OffsetNumber *)
(ptr + xlrec->ndeleted * sizeof(OffsetNumber));
updates = (xl_btree_update *) ((char *) updatedoffsets +
xlrec->nupdated *
sizeof(OffsetNumber));
"

> I guess whatever else we
> do here, we should fix the comments.
>

Agree, please find attached a patch proposal doing so.

> Bottom line is that I think the two cases that have alignment issues
> as coded are xl_hash_vacuum_one_page and gistxlogDelete. Everything
> else is OK, as far as I can tell right now.
>

Thanks a lot for the repro(s) and explanations! That's very useful/helpful.

Based on your discovery about the wrong comments above, I'm now tempted to fix those 2 alignment issues
by using a FLEXIBLE_ARRAY_MEMBER within those structs (as you proposed in [1]) (as that should also prevent
any possible wrong comments about where the array is located).

What do you think?

[1]: https://www.postgresql.org/message-id/CA%2BTgmoaVcu_mbxbH%3DEccvKG6u8%2BMdQf9zx98uAL9zsStFwrYsQ%40mail.gmail.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v1-0001-fix-some-comments.patch text/plain 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dag Lem 2022-12-21 09:26:05 Re: daitch_mokotoff module
Previous Message Peter Smith 2022-12-21 09:04:39 Re: pgsql: Doc: Explain about Column List feature.