RE: Build-farm - intermittent error in 031_column_list.pl

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Build-farm - intermittent error in 031_column_list.pl
Date: 2022-05-25 02:28:01
Message-ID: TYCPR01MB837380308F399111E27076DDEDD69@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, May 24, 2022 9:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Sat, May 21, 2022 at 9:03 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Fri, May 20, 2022 at 4:01 PM Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >
> > > Also, we'd probably have to ignore RelationSyncEntry for a while,
> > > which seems quite expensive.
> > >
> >
> > Yet another option could be that we continue using a historic snapshot
> > but ignore publications that are not found for the purpose of
> > computing RelSyncEntry attributes. We won't mark such an entry as
> > valid till all the publications are loaded without anything missing. I
> > think such cases in practice won't be enough to matter. This means we
> > won't publish operations on tables corresponding to that publication
> > till we found such a publication and that seems okay.
> >
>
> Attached, find the patch to show what I have in mind for this. Today, we have
> received a bug report with a similar symptom [1] and that should also be fixed
> with this. The reported bug should also be fixed with this.
>
> Thoughts?
Hi,

I agree with this direction.
I think this approach solves the issue fundamentally
and is better than the first approach to add several calls
of wait_for_catchup in the test, since taking the first one
means we need to care about avoiding the same issue,
whenever we write a new (similar) test, even after the modification.

I've used the patch to check below things.
1. The patch can be applied and make check-world has passed without failure.
2. HEAD applied with the patch passed all tests in src/test/subscription
(including 031_column_list.pl), after commenting out of WalSndWaitForWal's WalSndKeepalive.
3. The new bug fix report in 'How is this possible "publication does not exist"' thread
has been fixed. FYI, after I execute the script's function, I also conduct
additional insert to the publisher, and this was correctly replicated on the subscriber.

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-05-25 02:46:06 Re: Build-farm - intermittent error in 031_column_list.pl
Previous Message wangw.fnst@fujitsu.com 2022-05-25 02:24:59 RE: Perform streaming logical transactions by background workers and parallel apply