Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-06 08:55:39
Message-ID: CAA4eK1KuSxRhwUoJWVW--EseKcob94-hrP+JMCGvGTR+Fnx2aA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 6, 2022 at 12:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jul 5, 2022 at 8:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > 2. Are we anytime removing transaction ids from catchanges->xip array?
>
> No.
>
> > If not, is there a reason for the same? I think we can remove it
> > either at commit/abort or even immediately after adding the xid/subxid
> > to committed->xip array.
>
> It might be a good idea but I'm concerned that removing XID from the
> array at every commit/abort or after adding it to committed->xip array
> might be costly as it requires adjustment of the array to keep its
> order. Removing XIDs from the array would make bsearch faster but the
> array is updated reasonably often (every 15 sec).
>

Fair point. However, I am slightly worried that we are unnecessarily
searching in this new array even when ReorderBufferTxn has the
required information. To avoid that, in function
SnapBuildXidHasCatalogChange(), we can first check
ReorderBufferXidHasCatalogChanges() and then check the array if the
first check doesn't return true. Also, by the way, do we need to
always keep builder->catchanges.xip updated via SnapBuildRestore()?
Isn't it sufficient that we just read and throw away contents from a
snapshot if builder->catchanges.xip is non-NULL?

I had additionally thought if can further optimize this solution to
just store this additional information when we need to serialize for
checkpoint record but I think that won't work because walsender can
restart even without resatart of server in which case the same problem
can occur. I am not if sure there is a way to further optimize this
solution, let me know if you have any ideas?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-07-06 08:58:44 Re: [PATCH] Optimize json_lex_string by batching character copying
Previous Message Drouvot, Bertrand 2022-07-06 08:18:08 Re: Patch proposal: New hooks in the connection path