RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: "'Oh, Mike'" <minsoo(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Subject: RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2021-10-05 07:37:23
Message-ID: OSBPR01MB4888111BDF767C34E9DF5D82EDAF9@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, September 24, 2021 5:03 PM I wrote:
> On Tuesday, March 16, 2021 1:35 AM Oh, Mike <minsoo(at)amazon(dot)com> wrote:
> > We noticed that the logical replication could fail when the
> > Standby::RUNNING_XACT record is generated in the middle of a catalog
> > modifying transaction and if the logical decoding has to restart from
> > the RUNNING_XACT WAL entry.
> ...
> > Proposed solution:
> > If we’re decoding a catalog modifying commit record, then check
> > whether it’s part of the RUNNING_XACT xid’s processed @ the
> > restart_lsn. If so, then add its xid & subxacts in the committed txns
> > list in the logical decoding snapshot.
> >
> > Please refer to the attachment for the proposed patch.
>
>
> Let me share some review comments for the patch.
....
> (3) suggestion of small readability improvement
>
> We calculate the same size twice here and DecodeCommit.
> I suggest you declare a variable that stores the computed result of size, which
> might shorten those codes.
>
> + /*
> + * xl_running_xacts contains a xids
> Flexible Array
> + * and its size is subxcnt + xcnt.
> + * Take that into account while
> allocating
> + * the memory for last_running.
> + */
> + last_running = (xl_running_xacts *)
> malloc(sizeof(xl_running_xacts)
> +
> + sizeof(TransactionId )
> +
> * (running->subxcnt + running->xcnt));
> + memcpy(last_running, running,
> sizeof(xl_running_xacts)
> +
> + (sizeof(TransactionId)
> +
> + * (running->subxcnt + running->xcnt)));
Let me add one more basic review comment in DecodeStandbyOp().

Why do you call raw malloc directly ?
You don't have the basic check whether the return value is
NULL or not and intended to call palloc here instead ?

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-10-05 07:41:08 Re: plperl: update ppport.h and fix configure version check
Previous Message David Rowley 2021-10-05 07:25:52 Re: jsonb crash