From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | amit(dot)kapila16(at)gmail(dot)com, osumi(dot)takamichi(at)fujitsu(dot)com, bdrouvot(at)amazon(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, minsoo(at)amazon(dot)com |
Subject: | Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns |
Date: | 2022-07-20 01:58:16 |
Message-ID: | CAD21AoAEt=w8Xi-pf=45RnXfirgP4-GukPMC7GZhE=WpD8MCBA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 20, 2022 at 9:58 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Tue, 19 Jul 2022 17:31:07 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in
> > On Tue, Jul 19, 2022 at 4:35 PM Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > At Tue, 19 Jul 2022 10:17:15 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> > > > Good work. I wonder without comments this may create a problem in the
> > > > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > > > freeing the memory any less robust. Also, for consistency, we can use
> > > > a similar check based on xcnt in the SnapBuildRestore to free the
> > > > memory in the below code:
> > > > + /* set catalog modifying transactions */
> > > > + if (builder->catchange.xip)
> > > > + pfree(builder->catchange.xip);
> > >
> > > But xip must be positive there. We can add a comment explains that.
> > >
> >
> > Yes, if we add the comment for it, probably we need to explain a gcc's
> > optimization but it seems to be too much to me.
>
> Ah, sorry. I confused with other place in SnapBuildPurgeCommitedTxn.
> I agree to you, that we don't need additional comment *there*.
>
> > > + catchange_xip = ReorderBufferGetCatalogChangesXacts(builder->reorder);
> > >
> > > catchange_xip is allocated in the current context, but ondisk is
> > > allocated in builder->context. I see it kind of inconsistent (even if
> > > the current context is same with build->context).
> >
> > Right. I thought that since the lifetime of catchange_xip is short,
> > until the end of SnapBuildSerialize() function we didn't need to
> > allocate it in builder->context. But given ondisk, we need to do that
> > for catchange_xip as well. Will fix it.
>
> Thanks.
>
> > > + if (builder->committed.xcnt > 0)
> > > + {
> > >
> > > It seems to me comitted.xip is always non-null, so we don't need this.
> > > I don't strongly object to do that, though.
> >
> > But committed.xcnt could be 0, right? We don't need to copy anything
> > by calling memcpy with size = 0 in this case. Also, it looks more
> > consistent with what we do for catchange_xcnt.
>
> Mmm. the patch changed that behavior. AllocateSnapshotBuilder always
> allocate the array with a fixed size. SnapBuildAddCommittedTxn still
> assumes builder->committed.xip is non-NULL. SnapBuildRestore *kept*
> ondisk.builder.commited.xip populated with a valid array pointer. But
> the patch allows committed.xip be NULL, thus in that case,
> SnapBuildAddCommitedTxn calls repalloc(NULL) which triggers assertion
> failure.
IIUC the patch doesn't allow committed.xip to be NULL since we don't
overwrite it if builder->committed.xcnt is 0 (i.e.,
ondisk.builder.committed.xip is NULL):
builder->committed.xcnt = ondisk.builder.committed.xcnt;
/* We only allocated/stored xcnt, not xcnt_space xids ! */
/* don't overwrite preallocated xip, if we don't have anything here */
if (builder->committed.xcnt > 0)
{
pfree(builder->committed.xip);
builder->committed.xcnt_space = ondisk.builder.committed.xcnt;
builder->committed.xip = ondisk.builder.committed.xip;
}
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2022-07-20 03:02:36 | Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts? |
Previous Message | Kyotaro Horiguchi | 2022-07-20 01:36:29 | Re: Expose last replayed timeline ID along with last replayed LSN |