Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Oh, Mike" <minsoo(at)amazon(dot)com>
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date: 2022-07-12 08:52:01
Message-ID: CAA4eK1KxnYuhU5AyusDAP1SKtVmwAS1wYJ+VNL155x=swbx+gA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 12, 2022 at 1:13 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jul 12, 2022 at 3:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Jul 12, 2022 at 11:38 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jul 12, 2022 at 10:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > I'm doing benchmark tests and will share the results.
> > > >
> > >
> > > I've done benchmark tests to measure the overhead introduced by doing
> > > bsearch() every time when decoding a commit record. I've simulated a
> > > very intensified situation where we decode 1M commit records while
> > > keeping builder->catchange.xip array but the overhead is negilible:
> > >
> > > HEAD: 584 ms
> > > Patched: 614 ms
> > >
> > > I've attached the benchmark script I used. With increasing
> > > LOG_SNAPSHOT_INTERVAL_MS to 90000, the last decoding by
> > > pg_logicla_slot_get_changes() decodes 1M commit records while keeping
> > > catalog modifying transactions.
> > >
> >
> > Thanks for the test. We should also see how it performs when (a) we
> > don't change LOG_SNAPSHOT_INTERVAL_MS,
>
> What point do you want to see in this test? I think the performance
> overhead depends on how many times we do bsearch() and how many
> transactions are in the list.
>

Right, I am not expecting any visible performance difference in this
case. This is to ensure that we are not incurring any overhead in the
more usual scenarios (or default cases). As per my understanding, the
purpose of increasing the value of LOG_SNAPSHOT_INTERVAL_MS is to
simulate a stress case for the changes made by the patch, and keeping
its value default will test the more usual scenarios.

> I increased this value to easily
> simulate the situation where we decode many commit records while
> keeping catalog modifying transactions. But even if we don't change
> this value, the result would not change if we don't change how many
> commit records we decode.
>
> > and (b) we have more DDL xacts
> > so that the array to search is somewhat bigger
>
> I've done the same performance tests while creating 64 catalog
> modifying transactions. The result is:
>
> HEAD: 595 ms
> Patched: 628 ms
>
> There was no big overhead.
>

Yeah, especially considering you have simulated a stress case for the patch.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shiy.fnst@fujitsu.com 2022-07-12 08:58:50 RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Previous Message James Vanns 2022-07-12 08:23:22 Re: Weird behaviour with binary copy, arrays and column count