Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-06-14 06:56:55
Message-ID: CAA4eK1+xrQ+C6=6NmyMaNzMig3jRM3aeZ1B4EH3OGNtqCNEzpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 13, 2022 at 8:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jun 7, 2022 at 9:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, May 25, 2022 at 12:11 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > >
> > > poc_add_regression_tests.patch adds regression tests for this bug. The
> > > regression tests are required for both HEAD and back-patching but I've
> > > separated this patch for testing the above two patches easily.
> > >
>
> Thank you for the comments.
>
> >
> > Few comments on the test case patch:
> > ===============================
> > 1.
> > +# For the transaction that TRUNCATEd the table tbl1, the last decoding decodes
> > +# only its COMMIT record, because it starts from the RUNNING_XACT
> > record emitted
> > +# during the first checkpoint execution. This transaction must be marked as
> > +# catalog-changes while decoding the COMMIT record and the decoding
> > of the INSERT
> > +# record must read the pg_class with the correct historic snapshot.
> > +permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate"
> > "s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert"
> > "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
> >
> > Will this test always work? What if we get an additional running_xact
> > record between steps "s0_commit" and "s0_begin" that is logged via
> > bgwriter? You can mimic that by adding an additional checkpoint
> > between those two steps. If we do that, the test will pass even
> > without the patch because I think the last decoding will start
> > decoding from this new running_xact record.
>
> Right. It could pass depending on the timing but doesn't fail
> depending on the timing. I think we need to somehow stop bgwriter to
> make the test case stable but it seems unrealistic.
>

Agreed, in my local testing for this case, I use to increase
LOG_SNAPSHOT_INTERVAL_MS to avoid such a situation but I understand it
is not practical via test.

> Do you have any
> better ideas?
>

No, I don't have any better ideas. I think it is better to add some
information related to this in the comments because it may help to
improve the test in the future if we come up with a better idea.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2022-06-14 07:32:05 Re: Replica Identity check of partition table on subscriber
Previous Message Amit Langote 2022-06-14 06:31:25 Re: Replica Identity check of partition table on subscriber