Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-14 17:36:39
Message-ID: CAD21AoA93C_a+3MetsY_bq1AQT4Yukf7pRsFPWZc74M=w_2daw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 14, 2022 at 12:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Jul 14, 2022 at 11:16 AM shiy(dot)fnst(at)fujitsu(dot)com
> <shiy(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Tue, Jul 12, 2022 5:23 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jul 12, 2022 at 5:58 PM shiy(dot)fnst(at)fujitsu(dot)com
> > > <shiy(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > It happened when executing the following code because it tried to free a
> > > NULL
> > > > pointer (catchange_xip).
> > > >
> > > > /* be tidy */
> > > > if (ondisk)
> > > > pfree(ondisk);
> > > > + if (catchange_xip)
> > > > + pfree(catchange_xip);
> > > > }
> > > >
> > > > It seems to be related to configure option. I could reproduce it when using
> > > > `./configure --enable-debug`.
> > > > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -
> > > ggdb"`.
> > >
> > > Hmm, I could not reproduce this problem even if I use ./configure
> > > --enable-debug. And it's weird that we checked if catchange_xip is not
> > > null but we did pfree for it:
> > >
> > > #1 pfree (pointer=0x0) at mcxt.c:1177
> > > #2 0x000000000078186b in SnapBuildSerialize (builder=0x1fd5e78,
> > > lsn=25719712) at snapbuild.c:1792
> > >
> > > Is it reproducible in your environment?
> >
> > Thanks for your reply! Yes, it is reproducible. And I also reproduced it on the
> > v4 patch you posted [1].
>
> Thank you for testing!

I've found out the exact cause of this problem and how to fix it. I'll
submit an updated patch next week with my analysis.

Thank you for testing and providing additional information off-list, Shi yu.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-07-14 18:09:38 Re: optimize lookups in snapshot [sub]xip arrays
Previous Message vignesh C 2022-07-14 16:32:42 Re: Refactor to make use of a common function for GetSubscriptionRelations and GetSubscriptionNotReadyRelations.