Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: amit(dot)kapila16(at)gmail(dot)com, shiy(dot)fnst(at)fujitsu(dot)com, bdrouvot(at)amazon(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, minsoo(at)amazon(dot)com
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-08-03 04:05:30
Message-ID: CAD21AoDZA3u63Nv=cJgnwYGnMrWopXQTYntDPMPr1WE5oUv3sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 2, 2022 at 3:30 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Mon, 1 Aug 2022 20:01:00 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> > On Mon, Aug 1, 2022 at 7:46 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > >
> > > I've attached updated patches for all branches. Please review them.
> > >
> >
> > Thanks, the patches look mostly good to me. I have made minor edits by
> > removing 'likely' from a few places as those don't seem to be adding
> > much value, changed comments at a few places, and was getting
> > compilation in error in v11/10 (snapbuild.c:2111:3: error: ‘for’ loop
> > initial declarations are only allowed in C99 mode) which I have fixed.
> > See attached, unless there are major comments/suggestions, I am
> > planning to push this day after tomorrow (by Wednesday) after another
> > pass.
>
> master:
> + * Read the contents of the serialized snapshot to the dest.
>
> Do we need the "the" before the "dest"?

Fixed.

>
> + {
> + int save_errno = errno;
> +
> + CloseTransientFile(fd);
> +
> + if (readBytes < 0)
> + {
> + errno = save_errno;
> + ereport(ERROR,
>
> Do we need the CloseTransientFile(fd) there? This call requires errno
> to be remembered but anyway OpenTransientFile'd files are to be close
> at transaction end. Actually CloseTransientFile() is not called
> before error'ing-out at error in other places.

As Amit mentioned, it's just moved from SnapBuildRestore(). Looking at
other code in snapbuild.c, we call CloseTransientFile before erroring
out. I think it's better to keep it consistent with nearby codes.

>
>
> + * from the LSN-ordered list of toplevel TXNs. We remove TXN from the list
>
> We remove "the" TXN"?

Fixed.

>
> + if (dlist_is_empty(&rb->catchange_txns))
> + {
> + Assert(rb->catchange_ntxns == 0);
> + return NULL;
> + }
>
> It seems that the assert is far simpler than dlist_is_empty(). Why
> don't we swap the conditions for if() and Assert() in the above?

Changed.

>
> + * the oldest running transaction窶冱 restart_decoding_lsn is.
>
> The line contains a broken characters.

Fixed.

>
>
> + * Either all the xacts got purged or none. It is only possible to
> + * partially remove the xids from this array if one or more of the xids
> + * are still running but not all. That can happen if we start decoding
>
> Assuming this, the commment below seems getting stale.
>
> + * catalog. We remove xids from this array when they become old enough to
> + * matter, and then it eventually becomes empty.
>
> "We discard this array when the all containing xids are gone. See
> SnapBuildPurgeOlderTxn for details." or something like?

Changed to:

We discard this array when all the xids in the list become old enough
to matter. See SnapBuildPurgeOlderTxn for details.

I've attached updated patches that incorporated the above comments as
well as the comments from Shi yu. Please review them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
REL13_v12-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 16.0 KB
REL12_v12-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 16.0 KB
REL15_v12-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 16.0 KB
master_v12-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 29.3 KB
REL14_v12-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 16.0 KB
REL11_v12-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 15.9 KB
REL10_v12-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 15.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-08-03 04:06:25 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Previous Message Amit Kapila 2022-08-03 03:57:58 Re: PostgreSQL 15 minor fixes in protocol.sgml