Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-25 01:45:46
Message-ID: CAD21AoC_oEy3EvgbJ=asz20Z3ROWtRM01f-F8+F_5VjhCvJX6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 23, 2022 at 8:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jul 22, 2022 at 11:48 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Jul 20, 2022 at 5:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jul 20, 2022 at 1:28 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > >
> > > This is required if we don't want to introduce a new set of functions
> > > as you proposed above. I am not sure which one is better w.r.t back
> > > patching effort later but it seems to me using flag stuff would make
> > > future back patches easier if we make any changes in
> > > SnapBuildCommitTxn.
> >
> > Understood.
> >
> > I've implemented this idea as well for discussion. Both patches have
> > the common change to remember the initial running transactions and to
> > purge them when decoding xl_running_xacts records. The difference is
> > how to mark the transactions as needing to be added to the snapshot.
> >
> > In v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch,
> > when the transaction is in the initial running xact list and its
> > commit record has XINFO_HAS_INVAL flag, we mark both the top
> > transaction and its all subtransactions as containing catalog changes
> > (which also means to create ReorderBufferTXN entries for them). These
> > transactions are added to the snapshot in SnapBuildCommitTxn() since
> > ReorderBufferXidHasCatalogChanges () for them returns true.
> >
> > In poc_mark_top_txn_has_inval.patch, when the transaction is in the
> > initial running xacts list and its commit record has XINFO_HAS_INVALS
> > flag, we set a new flag, say RBTXN_COMMIT_HAS_INVALS, only to the top
> > transaction.
> >
>
> It seems that the patch has missed the part to check if the xid is in
> the initial running xacts list?

Oops, right.

>
> > In SnapBuildCommitTxn(), we add all subtransactions to
> > the snapshot without checking ReorderBufferXidHasCatalogChanges() for
> > subtransactions if its top transaction has the RBTXN_COMMIT_HAS_INVALS
> > flag.
> >
> > A difference between the two ideas is the scope of changes: the former
> > changes only snapbuild.c but the latter changes both snapbuild.c and
> > reorderbuffer.c. Moreover, while the former uses the existing flag,
> > the latter adds a new flag to the reorder buffer for dealing with only
> > this case. I think the former idea is simpler in terms of that. But,
> > an advantage of the latter idea is that the latter idea can save to
> > create ReorderBufferTXN entries for subtransactions.
> >
> > Overall I prefer the former for now but I'd like to hear what others think.
> >
>
> I agree that the latter idea can have better performance in extremely
> special scenarios but introducing a new flag for the same sounds a bit
> ugly to me. So, I would also prefer to go with the former idea,
> however, I would also like to hear what Horiguchi-San and others have
> to say.

Agreed.

>
> Few comments on v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during:
> 1.
> +void
> +SnapBuildInitialXactSetCatalogChanges(SnapBuild *builder, TransactionId xid,
> + int subxcnt, TransactionId *subxacts,
> + XLogRecPtr lsn)
> +{
>
> I think it is better to name this function as
> SnapBuildXIDSetCatalogChanges as we use this to mark a particular
> transaction as having catalog changes.
>
> 2. Changed/added a few comments in the attached.

Thank you for the comments.

I've attached updated version patches for the master and back branches.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
REL14-v8-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch application/octet-stream 16.3 KB
REL10-v8-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch application/octet-stream 16.1 KB
REL13-v8-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch application/octet-stream 16.3 KB
REl11-v8-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch application/octet-stream 16.2 KB
REl12-v8-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch application/octet-stream 16.3 KB
master-v8-0001-Add-catalog-modifying-transactions-to-logical-dec.patch application/octet-stream 27.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2022-07-25 02:30:30 Re: Remove useless arguments in ReadCheckpointRecord().
Previous Message Michael Paquier 2022-07-25 01:08:42 Re: Refactor to make use of a common function for GetSubscriptionRelations and GetSubscriptionNotReadyRelations.