Re: Minimal logical decoding on standbys

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: Minimal logical decoding on standbys
Date: 2023-03-31 10:45:51
Message-ID: 21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 3/31/23 6:33 AM, Andres Freund wrote:
> Hi,
>
> On 2023-03-30 18:23:41 +0200, Drouvot, Bertrand wrote:
>> On 3/30/23 9:04 AM, Andres Freund wrote:
>>> I think this commit is ready to go. Unless somebody thinks differently, I
>>> think I might push it tomorrow.
>>
>> Great! Once done, I'll submit a new patch so that GlobalVisTestFor() can make
>> use of the heap relation in vacuumRedirectAndPlaceholder() (which will be possible
>> once 0001 is committed).
>
> Unfortunately I did find an issue doing a pre-commit review of the patch.
>
> The patch adds VISIBILITYMAP_IS_CATALOG_REL to xl_heap_visible.flags - but it
> does not remove the bit before calling visibilitymap_set().
>
> This ends up corrupting the visibilitymap, because the we'll set a bit for
> another page.
>

Oh I see, I did not think about that (not enough experience in the VM area).
Nice catch and thanks for pointing out!

> On a casual read, one very well might think that VISIBILITYMAP_IS_CATALOG_REL
> is a valid bit that could be set in the VM.
>

I see what you're saying now and do agree that's confusing.

> I am thinking of instead creating a separate namespace for the "xlog only"
> bits:
>
> /*
> * To detect recovery conflicts during logical decoding on a standby, we need
> * to know if a table is a user catalog table. For that we add an additional
> * bit into xl_heap_visible.flags, in addition to the above.
> *
> * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
> */
> #define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
> #define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
>
>
> That allows heap_xlog_visible() to do:
>
> Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
> vmbits = (xlrec->flags & VISIBILITYMAP_VALID_BITS);
>
> and pass vmbits istead of xlrec->flags to visibilitymap_set().
>

That sounds good to me. That way you'd ensure that VISIBILITYMAP_XLOG_CATALOG_REL is not
passed to visibilitymap_set().

>
> I'm also thinking of splitting the patch into two. One patch to pass down the
> heap relation into the new places, and another for the rest.

I think that makes sense. I don't know how far you've work on the split but please
find attached V54 doing such a split + implementing your VISIBILITYMAP_XLOG_VALID_BITS
suggestion.

>
> Note that gistXLogDelete() continues to register data with two different
> XLogRegisterData() calls. This will append data without any padding:
>
> XLogRegisterData((char *) &xlrec, SizeOfGistxlogDelete);
>
> /*
> * We need the target-offsets array whether or not we store the whole
> * buffer, to allow us to find the snapshotConflictHorizon on a standby
> * server.
> */
> XLogRegisterData((char *) todelete, ntodelete * sizeof(OffsetNumber));
>
>
> But replay now uses the new offset member:
>
>> @@ -177,6 +177,7 @@ gistRedoDeleteRecord(XLogReaderState *record)
>> gistxlogDelete *xldata = (gistxlogDelete *) XLogRecGetData(record);
>> Buffer buffer;
>> Page page;
>> + OffsetNumber *toDelete = xldata->offsets;
>>
>> /*
>> * If we have any conflict processing to do, it must happen before we
>
>
> That doesn't look right. If there's any padding before offsets, we'll afaict
> read completely bogus data?
>
> As it turns out, there is padding:
>
> struct gistxlogDelete {
> TransactionId snapshotConflictHorizon; /* 0 4 */
> uint16 ntodelete; /* 4 2 */
> _Bool isCatalogRel; /* 6 1 */
>
> /* XXX 1 byte hole, try to pack */
>
> OffsetNumber offsets[]; /* 8 0 */
>
> /* size: 8, cachelines: 1, members: 4 */
> /* sum members: 7, holes: 1, sum holes: 1 */
> /* last cacheline: 8 bytes */
> };
>
>
> I am frankly baffled how this works at all, this should just about immediately
> crash?
>
>

Oh, I see. Hm, don't we have already the same issue for spgxlogVacuumRoot / vacuumLeafRoot() / spgRedoVacuumRoot()?

>
> I'm not going to commit a nontrivial change to these WAL records without some
> minimal tests.
>

That makes fully sense.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v54-0007-Doc-changes-describing-details-about-logical-dec.patch text/plain 2.2 KB
v54-0006-New-TAP-test-for-logical-decoding-on-standby.patch text/plain 32.9 KB
v54-0005-Fixing-Walsender-corner-case-with-logical-decodi.patch text/plain 7.7 KB
v54-0004-Allow-logical-decoding-on-standby.patch text/plain 11.8 KB
v54-0003-Handle-logical-slot-conflicts-on-standby.patch text/plain 37.2 KB
v54-0002-Add-info-in-WAL-records-in-preparation-for-logic.patch text/plain 17.9 KB
v54-0001-Pass-down-the-heap-relation-into-the-new-places.patch text/plain 61.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-03-31 11:12:05 Re: TAP output format in pg_regress
Previous Message Ashutosh Bapat 2023-03-31 10:16:36 Re: Infinite Interval