Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-08-01 02:16:21
Message-ID: CAD21AoBRtPLEYcTi5YOzRbh06xXqM1aZmi9Dkczfbd5=6hKTrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jul 29, 2022 at 5:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Jul 28, 2022 at 8:57 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jul 28, 2022 at 3:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > I have another comment on this patch:
> > > SnapBuildPurgeOlderTxn()
> > > {
> > > ...
> > > + if (surviving_xids > 0)
> > > + memmove(builder->catchange.xip, &(builder->catchange.xip[off]),
> > > + surviving_xids * sizeof(TransactionId))
> > > ...
> > >
> > > For this code to hit, we must have a situation where one or more of
> > > the xacts in this array must be still running. And, if that is true,
> > > we would not have started from the restart point where the
> > > corresponding snapshot (that contains the still running xacts) has
> > > been serialized because we advance the restart point to not before the
> > > oldest running xacts restart_decoding_lsn. This may not be easy to
> > > understand so let me take an example to explain. Say we have two
> > > transactions t1 and t2, and both have made catalog changes. We want a
> > > situation where one of those gets purged and the other remains in
> > > builder->catchange.xip array. I have tried variants of the below
> > > sequence to see if I can get into the required situation but am not
> > > able to make it.
> > >
> > > Session-1
> > > Checkpoint -1;
> > > T1
> > > DDL
> > >
> > > Session-2
> > > T2
> > > DDL
> > >
> > > Session-3
> > > Checkpoint-2;
> > > pg_logical_slot_get_changes()
> > > -- Here when we serialize the snapshot corresponding to
> > > CHECKPOINT-2's running_xact record, we will serialize both t1 and t2
> > > as catalog-changing xacts.
> > >
> > > Session-1
> > > T1
> > > Commit;
> > >
> > > Checkpoint;
> > > pg_logical_slot_get_changes()
> > > -- Here we will restore from Checkpoint-1's serialized snapshot and
> > > won't be able to move restart_point to Checkpoint-2 because T2 is
> > > still open.
> > >
> > > Now, as per my understanding, it is only possible to move
> > > restart_point to Checkpoint-2 if T2 gets committed/rolled-back in
> > > which case we will never have that in surviving_xids array after the
> > > purge.
> > >
> > > It is possible I am missing something here. Do let me know your thoughts.
> >
> > Yeah, your description makes sense to me. I've also considered how to
> > hit this path but I guess it is never hit. Thinking of it in another
> > way, first of all, at least 2 catalog modifying transactions have to
> > be running while writing a xl_running_xacts. The serialized snapshot
> > that is written when we decode the first xl_running_xact has two
> > transactions. Then, one of them is committed before the second
> > xl_running_xacts. The second serialized snapshot has only one
> > transaction. Then, the transaction is also committed after that. Now,
> > in order to execute the path, we need to start decoding from the first
> > serialized snapshot. However, if we start from there, we cannot decode
> > the full contents of the transaction that was committed later.
> >
>
> I think then we should change this code in the master branch patch
> with an additional comment on the lines of: "Either all the xacts got
> purged or none. It is only possible to partially remove the xids from
> this array if one or more of the xids are still running but not all.
> That can happen if we start decoding from a point (LSN where the
> snapshot state became consistent) where all the xacts in this were
> running and then at least one of those got committed and a few are
> still running. We will never start from such a point because we won't
> move the slot's restart_lsn past the point where the oldest running
> transaction's restart_decoding_lsn is."

Agreed.

>
> I suggest keeping the back branch as it is w.r.t this change as if
> this logic proves to be faulty it won't affect the stable branches. We
> can always back-patch this small change if required.

Yes, during PG16 release cycle, we can have time for evaluating
whether the approach in the master branch is correct. We can always
back-patch the part.

I've attached updated patches for all branches. Please review them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
REL15_v10-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 16.2 KB
REL13_v10-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 16.1 KB
REL14_v10-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 16.1 KB
REL12_v10-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 16.1 KB
master_v10-0001-Add-catalog-modifying-transactions-to-logical-de.patch application/octet-stream 29.3 KB
REL10_v10-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 15.8 KB
REL11_v10-0001-Fix-catalog-lookup-with-the-wrong-snapshot-durin.patch application/octet-stream 16.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-08-01 03:33:44 RE: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger
Previous Message Andres Freund 2022-08-01 02:02:58 Re: ci: update to freebsd 13.1 / remove minor versions from image names