Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-06-07 12:32:13
Message-ID: CAA4eK1+zKuMfYBA_098c1dKjMKGmwfaiJS1RJqswmJ8Fy8VOgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, May 25, 2022 at 12:11 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
>
> poc_add_regression_tests.patch adds regression tests for this bug. The
> regression tests are required for both HEAD and back-patching but I've
> separated this patch for testing the above two patches easily.
>

Few comments on the test case patch:
===============================
1.
+# For the transaction that TRUNCATEd the table tbl1, the last decoding decodes
+# only its COMMIT record, because it starts from the RUNNING_XACT
record emitted
+# during the first checkpoint execution. This transaction must be marked as
+# catalog-changes while decoding the COMMIT record and the decoding
of the INSERT
+# record must read the pg_class with the correct historic snapshot.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate"
"s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert"
"s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"

Will this test always work? What if we get an additional running_xact
record between steps "s0_commit" and "s0_begin" that is logged via
bgwriter? You can mimic that by adding an additional checkpoint
between those two steps. If we do that, the test will pass even
without the patch because I think the last decoding will start
decoding from this new running_xact record.

2.
+step "s1_get_changes" { SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', '0'); }

It is better to skip empty transactions by using 'skip-empty-xacts' to
avoid any transaction from a background process like autovacuum. We
have previously seen some buildfarm failures due to that.

3. Did you intentionally omit the .out from the test case patch?

4.
This transaction must be marked as
+# catalog-changes while decoding the COMMIT record and the decoding
of the INSERT
+# record must read the pg_class with the correct historic snapshot.

/marked as catalog-changes/marked as containing catalog changes

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-06-07 12:56:02 Re: broken regress tests on fedora 36
Previous Message Dagfinn Ilmari Mannsåker 2022-06-07 11:18:52 Re: Logging query parmeters in auto_explain