Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2021-10-11 09:44:21
Message-ID: 89fbeef1-aebe-5f1c-b559-2ea8cb90dcbd@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 10/11/21 8:27 AM, Masahiko Sawada wrote:
> On Fri, Oct 8, 2021 at 4:50 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>> At Thu, 7 Oct 2021 13:20:14 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in
>>> Another idea to fix this problem would be that before calling
>>> SnapBuildCommitTxn() we create transaction entries in ReorderBuffer
>>> for (sub)transactions whose COMMIT record has XACT_XINFO_HAS_INVALS,
>>> and then mark all of them as catalog-changed by calling
>>> ReorderBufferXidSetCatalogChanges(). I've attached a PoC patch for
>>> this idea. What the patch does is essentially the same as what the
>>> proposed patch does. But the patch doesn't modify the
>>> SnapBuildCommitTxn(). And we remember the list of last running
>>> transactions in reorder buffer and the list is periodically purged
>>> during decoding RUNNING_XACTS records, eventually making it empty.
>> I came up with the third way. SnapBuildCommitTxn already properly
>> handles the case where a ReorderBufferTXN with
>> RBTXN_HAS_CATALOG_CHANGES. So this issue can be resolved by create
>> such ReorderBufferTXNs in SnapBuildProcessRunningXacts.
> Thank you for the idea and patch!

Thanks you both for your new patches proposal!

I liked Sawada's one but also do "prefer" Horiguchi's one.

>
> It's much simpler than mine. I think that creating an entry of a
> catalog-changed transaction in the reorder buffer before
> SunapBuildCommitTxn() is the right direction.
+1
>
> After more thought, given DDLs are not likely to happen than DML in
> practice, probably we can always mark both the top transaction and its
> subtransactions as containing catalog changes if the commit record has
> XACT_XINFO_HAS_INVALS? I believe this is not likely to lead to
> overhead in practice. That way, the patch could be more simple and
> doesn't need the change of AssertTXNLsnOrder().
>
> I've attached another PoC patch. Also, I've added the tests for this
> issue in test_decoding.

Thanks!

It looks good to me, just have a remark about the comment:

+   /*
+    * Mark the top transaction and its subtransactions as containing
catalog
+    * changes, if the commit record has invalidation message. This is
necessary
+    * for the case where we decode only the commit record of the
transaction
+    * that actually has done catalog changes.
+    */

What about?

    /*
     * Mark the top transaction and its subtransactions as containing
catalog
     * changes, if the commit record has invalidation message. This is
necessary
     * for the case where we did not decode the transaction that did
the catalog
     * change(s) (the decoding restarted after). So that we are
decoding only the
     * commit record of the transaction that actually has done catalog
changes.
     */

Thanks

Bertrand

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-10-11 09:59:58 Re: Reword docs of feature "Remove temporary files after backend crash"
Previous Message Justin Pryzby 2021-10-11 09:23:50 Re: Reword docs of feature "Remove temporary files after backend crash"