Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-22 06:17:50
Message-ID: CAD21AoC8NLGJ-Q+u60cGPSnxXqexEFPLo-5-offsgCE_ikVxRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 20, 2022 at 5:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jul 20, 2022 at 1:28 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Jul 20, 2022 at 2:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jul 20, 2022 at 9:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > > Another idea would be to have functions, say
> > > > SnapBuildCommitTxnWithXInfo() and SnapBuildCommitTxn_ext(). The latter
> > > > does actual work of handling transaction commits and both
> > > > SnapBuildCommitTxn() and SnapBuildCommit() call
> > > > SnapBuildCommitTxnWithXInfo() with different arguments.
> > > >
> > >
> > > Do you want to say DecodeCommit() instead of SnapBuildCommit() in
> > > above para?
> >
> > I meant that we will call like DecodeCommit() ->
> > SnapBuildCommitTxnWithXInfo() -> SnapBuildCommitTxn_ext(has_invals =
> > true) -> SnapBuildXidHasCatalogChanges(has_invals = true) -> ... If
> > SnapBuildCommitTxn() gets called, it calls SnapBuildCommitTxn_ext()
> > with has_invals = false and behaves the same as before.
> >
>
> Okay, understood. This will work.
>
> > > Yet another idea could be to have another flag
> > > RBTXN_HAS_INVALS which will be set by DecodeCommit for top-level TXN.
> > > Then, we can retrieve it even for each of the subtxn's if and when
> > > required.
> >
> > Do you mean that when checking if the subtransaction has catalog
> > changes, we check if its top-level XID has this new flag?
> >
>
> Yes.
>
> > Why do we
> > need the new flag?
> >
>
> This is required if we don't want to introduce a new set of functions
> as you proposed above. I am not sure which one is better w.r.t back
> patching effort later but it seems to me using flag stuff would make
> future back patches easier if we make any changes in
> SnapBuildCommitTxn.

Understood.

I've implemented this idea as well for discussion. Both patches have
the common change to remember the initial running transactions and to
purge them when decoding xl_running_xacts records. The difference is
how to mark the transactions as needing to be added to the snapshot.

In v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch,
when the transaction is in the initial running xact list and its
commit record has XINFO_HAS_INVAL flag, we mark both the top
transaction and its all subtransactions as containing catalog changes
(which also means to create ReorderBufferTXN entries for them). These
transactions are added to the snapshot in SnapBuildCommitTxn() since
ReorderBufferXidHasCatalogChanges () for them returns true.

In poc_mark_top_txn_has_inval.patch, when the transaction is in the
initial running xacts list and its commit record has XINFO_HAS_INVALS
flag, we set a new flag, say RBTXN_COMMIT_HAS_INVALS, only to the top
transaction. In SnapBuildCommitTxn(), we add all subtransactions to
the snapshot without checking ReorderBufferXidHasCatalogChanges() for
subtransactions if its top transaction has the RBTXN_COMMIT_HAS_INVALS
flag.

A difference between the two ideas is the scope of changes: the former
changes only snapbuild.c but the latter changes both snapbuild.c and
reorderbuffer.c. Moreover, while the former uses the existing flag,
the latter adds a new flag to the reorder buffer for dealing with only
this case. I think the former idea is simpler in terms of that. But,
an advantage of the latter idea is that the latter idea can save to
create ReorderBufferTXN entries for subtransactions.

Overall I prefer the former for now but I'd like to hear what others think.

FWIW, I didn't try the idea of adding wrapper functions since it would
be costly in terms of back patching effort in the future.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
poc_mark_top_txn_has_inval.patch application/x-patch 15.4 KB
v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch application/x-patch 16.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-07-22 06:23:45 Re: Strange failures on chipmunk
Previous Message Kyotaro Horiguchi 2022-07-22 06:16:00 Re: Unprivileged user can induce crash by using an SUSET param in PGOPTIONS