Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: amit(dot)kapila16(at)gmail(dot)com
Cc: sawada(dot)mshk(at)gmail(dot)com, shiy(dot)fnst(at)fujitsu(dot)com, bdrouvot(at)amazon(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, minsoo(at)amazon(dot)com
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-08-02 06:30:12
Message-ID: 20220802.153012.363912034520325077.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 1 Aug 2022 20:01:00 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> On Mon, Aug 1, 2022 at 7:46 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> >
> > I've attached updated patches for all branches. Please review them.
> >
>
> Thanks, the patches look mostly good to me. I have made minor edits by
> removing 'likely' from a few places as those don't seem to be adding
> much value, changed comments at a few places, and was getting
> compilation in error in v11/10 (snapbuild.c:2111:3: error: ‘for’ loop
> initial declarations are only allowed in C99 mode) which I have fixed.
> See attached, unless there are major comments/suggestions, I am
> planning to push this day after tomorrow (by Wednesday) after another
> pass.

master:
+ * Read the contents of the serialized snapshot to the dest.

Do we need the "the" before the "dest"?

+ {
+ int save_errno = errno;
+
+ CloseTransientFile(fd);
+
+ if (readBytes < 0)
+ {
+ errno = save_errno;
+ ereport(ERROR,

Do we need the CloseTransientFile(fd) there? This call requires errno
to be remembered but anyway OpenTransientFile'd files are to be close
at transaction end. Actually CloseTransientFile() is not called
before error'ing-out at error in other places.

+ * from the LSN-ordered list of toplevel TXNs. We remove TXN from the list

We remove "the" TXN"?

+ if (dlist_is_empty(&rb->catchange_txns))
+ {
+ Assert(rb->catchange_ntxns == 0);
+ return NULL;
+ }

It seems that the assert is far simpler than dlist_is_empty(). Why
don't we swap the conditions for if() and Assert() in the above?

+ * the oldest running transaction窶冱 restart_decoding_lsn is.

The line contains a broken characters.

+ * Either all the xacts got purged or none. It is only possible to
+ * partially remove the xids from this array if one or more of the xids
+ * are still running but not all. That can happen if we start decoding

Assuming this, the commment below seems getting stale.

+ * catalog. We remove xids from this array when they become old enough to
+ * matter, and then it eventually becomes empty.

"We discard this array when the all containing xids are gone. See
SnapBuildPurgeOlderTxn for details." or something like?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2022-08-02 06:44:04 Re: [Commitfest 2022-07] is Done!
Previous Message Bharath Rupireddy 2022-08-02 06:22:02 Re: Progress report removal of temp files and temp relation files using ereport_startup_progress