Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2021-10-07 04:20:14
Message-ID: CAD21AoAn-k6OpZ6HSAH_G91tpTXR6KYvkf663kg6EqW-f6sz1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 23, 2021 at 5:44 PM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>
> Hi,
>
> On 7/29/21 10:25 AM, Masahiko Sawada wrote:
> > Thank you for reporting the issue.
> >
> > I could reproduce this issue by the steps you shared.
>
> Thanks for looking at it!
>
> >
> >> Currently, the system relies on processing Heap2::NEW_CID to keep track of catalog modifying (sub)transactions.
> >>
> >> This context is lost if the logical decoding has to restart from a Standby::RUNNING_XACTS that is written between the NEW_CID record and its parent txn commit.
> >>
> >> If the logical stream restarts from this restart_lsn, then it doesn't have the xid responsible for modifying the catalog.
> >>
> > I agree with your analysis. Since we don’t use commit WAL record to
> > track the transaction that has modified system catalogs, if we decode
> > only the commit record of such transaction, we cannot know the
> > transaction has been modified system catalogs, resulting in the
> > subsequent transaction scans system catalog with the wrong snapshot.
> Right.
> >
> > With the patch, if the commit WAL record has a XACT_XINFO_HAS_INVALS
> > flag and its xid is included in RUNNING_XACT record written at
> > restart_lsn, we forcibly add the top XID and its sub XIDs as a
> > committed transaction that has modified system catalogs to the
> > snapshot. I might be missing something about your patch but I have
> > some comments on this approach:
> >
> > 1. Commit WAL record may not have invalidation message for system
> > catalogs (e.g., when commit record has only invalidation message for
> > relcache) even if it has XACT_XINFO_HAS_INVALS flag.
>
> Right, good point (create policy for example would lead to an
> invalidation for relcache only).
>
> > In this case, the
> > transaction wrongly is added to the snapshot, is that okay?
> This transaction is a committed one, and IIUC the snapshot would be used
> only for catalog visibility, so i don't see any issue to add it in the
> snapshot, what do you think?

It seems to me that it's no problem since we always transaction with
catalog-changed when decoding XLOG_XACT_INVALIDATIONS records.

> >
> > 2. We might add a subtransaction XID as a committed transaction that
> > has modified system catalogs even if it actually didn't.
>
> Right, like when needs_timetravel is true.
>
> > As the
> > comment in SnapBuildBuildSnapshot() describes, we track only the
> > transactions that have modified the system catalog and store in the
> > snapshot (in the ‘xip' array). The patch could break that assumption.
> Right. It looks to me that breaking this assumption is not an issue.
>
> IIUC currently the committed ones that are not modifying the catalog are
> not stored "just" because we don't need them.
> > However, I’m really not sure how to deal with this point. We cannot
> > know which subtransaction has actually modified system catalogs by
> > using only the commit WAL record.
> Right, unless we rewrite this patch so that a commit WAL record will
> produce this information.
> >
> > 3. The patch covers only the case where the restart_lsn exactly
> > matches the LSN of RUNNING_XACT.
> Right.
> > I wonder if there could be a case
> > where the decoding starts at a WAL record other than RUNNING_XACT but
> > the next WAL record is RUNNING_XACT.
>
> Not sure, but could a restart_lsn not be a RUNNING_XACTS?

I guess the decoding always starts from RUNING_XACTS.
After more thought, I think that the basic approach of the proposed
patch is a probably good idea, which we add xid whose commit record
has XACT_XINFO_HAS_INVALS to the snapshot. The problem as I see is
that during decoding COMMIT record we cannot know which transactions
(top transaction or subtransactions) actually did catalog changes. But
given that even if XLOG_XACT_INVALIDATION has only relcache
invalidation message we always mark the transaction with
catalog-changed, it seems no problem. Therefore, in the reported
cases, probably we can add both the top transaction xid and its
subscription xids to the snapshot.

Regarding the patch details, I have two comments:

---
+ if ((parsed->xinfo & XACT_XINFO_HAS_INVALS) && last_running)
+ {
+ /* make last_running->xids bsearch()able */
+ qsort(last_running->xids,
+ last_running->subxcnt + last_running->xcnt,
+ sizeof(TransactionId), xidComparator);

The patch does qsort() every time when the commit message has
XACT_XINFO_HAS_INVALS. IIUC the xids we need to remember is the only
xids that are recorded in the first replayed XLOG_RUNNING_XACTS,
right? If so, we need to do qsort() once, can remove xid from the
array once it gets committed, and then can eventually make
last_running empty so that we can skip even TransactionIdInArray().

---
Since last_running is allocated by malloc() and it isn't freed even
after finishing logical decoding.

Another idea to fix this problem would be that before calling
SnapBuildCommitTxn() we create transaction entries in ReorderBuffer
for (sub)transactions whose COMMIT record has XACT_XINFO_HAS_INVALS,
and then mark all of them as catalog-changed by calling
ReorderBufferXidSetCatalogChanges(). I've attached a PoC patch for
this idea. What the patch does is essentially the same as what the
proposed patch does. But the patch doesn't modify the
SnapBuildCommitTxn(). And we remember the list of last running
transactions in reorder buffer and the list is periodically purged
during decoding RUNNING_XACTS records, eventually making it empty.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
poc_remember_last_running_xacts.patch application/octet-stream 6.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-10-07 04:26:46 Re: Lost logs with csvlog redirected to stderr under WIN32 service
Previous Message kuroda.hayato@fujitsu.com 2021-10-07 03:07:33 Question about client_connection_check_interval