Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-14 03:06:07
Message-ID: CAD21AoCZ0xpS-PSLa1ot=TijntYsnHKZ=6u6QAocPcO3E08Q5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 14, 2022 at 11:16 AM shiy(dot)fnst(at)fujitsu(dot)com
<shiy(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Tue, Jul 12, 2022 5:23 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Jul 12, 2022 at 5:58 PM shiy(dot)fnst(at)fujitsu(dot)com
> > <shiy(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > It happened when executing the following code because it tried to free a
> > NULL
> > > pointer (catchange_xip).
> > >
> > > /* be tidy */
> > > if (ondisk)
> > > pfree(ondisk);
> > > + if (catchange_xip)
> > > + pfree(catchange_xip);
> > > }
> > >
> > > It seems to be related to configure option. I could reproduce it when using
> > > `./configure --enable-debug`.
> > > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -
> > ggdb"`.
> >
> > Hmm, I could not reproduce this problem even if I use ./configure
> > --enable-debug. And it's weird that we checked if catchange_xip is not
> > null but we did pfree for it:
> >
> > #1 pfree (pointer=0x0) at mcxt.c:1177
> > #2 0x000000000078186b in SnapBuildSerialize (builder=0x1fd5e78,
> > lsn=25719712) at snapbuild.c:1792
> >
> > Is it reproducible in your environment?
>
> Thanks for your reply! Yes, it is reproducible. And I also reproduced it on the
> v4 patch you posted [1].

Thank you for testing!

>
> [1] https://www.postgresql.org/message-id/CAD21AoAyNPrOFg%2BQGh%2B%3D4205TU0%3DyrE%2BQyMgzStkH85uBZXptQ%40mail.gmail.com
>
> > If so, could you test it again
> > with the following changes?
> >
> > diff --git a/src/backend/replication/logical/snapbuild.c
> > b/src/backend/replication/logical/snapbuild.c
> > index d015c06ced..a6e76e3781 100644
> > --- a/src/backend/replication/logical/snapbuild.c
> > +++ b/src/backend/replication/logical/snapbuild.c
> > @@ -1788,7 +1788,7 @@ out:
> > /* be tidy */
> > if (ondisk)
> > pfree(ondisk);
> > - if (catchange_xip)
> > + if (catchange_xip != NULL)
> > pfree(catchange_xip);
> > }
> >
>
> I tried this and could still reproduce the problem.

Does the backtrace still show we attempt to pfree a null-pointer?

>
> Besides, I tried the suggestion from Amit [2], it could be fixed by checking
> the value of catchange_xcnt instead of catchange_xip before pfree.

Could you check if this problem occurred when we reached there via
goto pass, i.e., did we call ReorderBufferGetCatalogChangesXacts() or
not?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2022-07-14 03:15:24 Re: Add connection active, idle time to pg_stat_activity
Previous Message Michael Paquier 2022-07-14 03:03:02 Re: Remove support for Visual Studio 2013