Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-19 01:03:45
Message-ID: CAD21AoDfZCmjUS74t+_7j2g+MshYcp7026mufR3SRKW6XAgAHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > This patch should have the fix for the issue that Shi yu reported. Shi
> > yu, could you please test it again with this patch?
> >
>
> Can you explain the cause of the failure and your fix for the same?

@@ -1694,6 +1788,8 @@ out:
/* be tidy */
if (ondisk)
pfree(ondisk);
+ if (catchange_xip)
+ pfree(catchange_xip);

Regarding the above code in the previous version patch, looking at the
generated assembler code shared by Shi yu offlist, I realized that the
“if (catchange_xip)” is removed (folded) by gcc optimization. This is
because we dereference catchange_xip before null-pointer check as
follow:

+ /* copy catalog modifying xacts */
+ sz = sizeof(TransactionId) * catchange_xcnt;
+ memcpy(ondisk_c, catchange_xip, sz);
+ COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
+ ondisk_c += sz;

Since sz is 0 in this case, memcpy doesn’t do anything actually.

By checking the assembler code, I’ve confirmed that gcc does the
optimization for these code and setting
-fno-delete-null-pointer-checks flag prevents the if statement from
being folded. Also, I’ve confirmed that adding the check if
"catchange.xcnt > 0” before the null-pointer check also can prevent
that. Adding a check if "catchange.xcnt > 0” looks more robust. I’ve
added a similar check for builder->committed.xcnt as well for
consistency. builder->committed.xip could have no transactions.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2022-07-19 01:03:58 Re: Costing elided SubqueryScans more nearly correctly
Previous Message Bruce Momjian 2022-07-19 00:33:42 Re: System catalog documentation chapter