Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-28 03:21:28
Message-ID: CAA4eK1K+tjr8C_ZcAgcpbcrtMmq6yMPdp--OxiKoUwebz5EUfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 28, 2022 at 7:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Jul 27, 2022 at 8:33 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
>
> > I have changed accordingly in the attached
> > and apart from that slightly modified the comments and commit message.
> > Do let me know what you think of the attached?
>
> It would be better to remember the initial running xacts after
> SnapBuildRestore() returns true? Because otherwise, we could end up
> allocating InitialRunningXacts multiple times while leaking the old
> ones if there are no serialized snapshots that we are interested in.
>

Right, this makes sense. But note that you can no longer have a check
(builder->state == SNAPBUILD_START) which I believe is not required.
We need to do this after restore, in whichever state snapshot was as
any state other than SNAPBUILD_CONSISTENT can have commits without all
their changes.

Accordingly, I think the comment: "Remember the transactions and
subtransactions that were running when xl_running_xacts record that we
decoded first was written." needs to be slightly modified to something
like: "Remember the transactions and subtransactions that were running
when xl_running_xacts record that we decoded was written.". Change
this if it is used at any other place in the patch.

> ---
> + if (builder->state == SNAPBUILD_START)
> + {
> + int nxacts =
> running->subxcnt + running->xcnt;
> + Size sz = sizeof(TransactionId) * nxacts;
> +
> + NInitialRunningXacts = nxacts;
> + InitialRunningXacts =
> MemoryContextAlloc(builder->context, sz);
> + memcpy(InitialRunningXacts, running->xids, sz);
> + qsort(InitialRunningXacts, nxacts,
> sizeof(TransactionId), xidComparator);
> + }
>
> We should allocate the memory for InitialRunningXacts only when
> (running->subxcnt + running->xcnt) > 0.
>

There is no harm in doing that but ideally, that case would have been
covered by an earlier check "if (running->oldestRunningXid ==
running->nextXid)" which suggests "No transactions were running, so we
can jump to consistent."

Kindly make the required changes and submit the back branch patches again.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-07-28 03:22:31 Re: Official Windows Installer and Documentation
Previous Message David Rowley 2022-07-28 02:59:21 Re: Skip partition tuple routing with constant partition key