RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-14 02:16:00
Message-ID: OSZPR01MB63102143FE2431519DEF7AFDFD889@OSZPR01MB6310.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 12, 2022 5:23 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jul 12, 2022 at 5:58 PM shiy(dot)fnst(at)fujitsu(dot)com
> <shiy(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > It happened when executing the following code because it tried to free a
> NULL
> > pointer (catchange_xip).
> >
> > /* be tidy */
> > if (ondisk)
> > pfree(ondisk);
> > + if (catchange_xip)
> > + pfree(catchange_xip);
> > }
> >
> > It seems to be related to configure option. I could reproduce it when using
> > `./configure --enable-debug`.
> > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -
> ggdb"`.
>
> Hmm, I could not reproduce this problem even if I use ./configure
> --enable-debug. And it's weird that we checked if catchange_xip is not
> null but we did pfree for it:
>
> #1 pfree (pointer=0x0) at mcxt.c:1177
> #2 0x000000000078186b in SnapBuildSerialize (builder=0x1fd5e78,
> lsn=25719712) at snapbuild.c:1792
>
> Is it reproducible in your environment?

Thanks for your reply! Yes, it is reproducible. And I also reproduced it on the
v4 patch you posted [1].

[1] https://www.postgresql.org/message-id/CAD21AoAyNPrOFg%2BQGh%2B%3D4205TU0%3DyrE%2BQyMgzStkH85uBZXptQ%40mail.gmail.com

> If so, could you test it again
> with the following changes?
>
> diff --git a/src/backend/replication/logical/snapbuild.c
> b/src/backend/replication/logical/snapbuild.c
> index d015c06ced..a6e76e3781 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -1788,7 +1788,7 @@ out:
> /* be tidy */
> if (ondisk)
> pfree(ondisk);
> - if (catchange_xip)
> + if (catchange_xip != NULL)
> pfree(catchange_xip);
> }
>

I tried this and could still reproduce the problem.

Besides, I tried the suggestion from Amit [2], it could be fixed by checking
the value of catchange_xcnt instead of catchange_xip before pfree.

[2] https://www.postgresql.org/message-id/CAA4eK1%2BXPdm8G%3DEhUJA12Pi1YvQAfcz2%3DkTd9a4BjVx4%3Dgk-MA%40mail.gmail.com

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index c482e906b0..68b9c4ef7d 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1573,7 +1573,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
Size needed_length;
SnapBuildOnDisk *ondisk = NULL;
TransactionId *catchange_xip = NULL;
- size_t catchange_xcnt;
+ size_t catchange_xcnt = 0;
char *ondisk_c;
int fd;
char tmppath[MAXPGPATH];
@@ -1788,7 +1788,7 @@ out:
/* be tidy */
if (ondisk)
pfree(ondisk);
- if (catchange_xip)
+ if (catchange_xcnt != 0)
pfree(catchange_xip);
}

Regards,
Shi yu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-07-14 02:40:25 Re: i.e. and e.g.
Previous Message Masahiko Sawada 2022-07-14 01:32:06 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns