Re: Build-farm - intermittent error in 031_column_list.pl

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: amit(dot)kapila16(at)gmail(dot)com
Cc: tomas(dot)vondra(at)enterprisedb(dot)com, smithpb2250(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Build-farm - intermittent error in 031_column_list.pl
Date: 2022-05-20 01:28:19
Message-ID: 20220520.102819.126748780727079715.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 19 May 2022 16:42:31 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> On Thu, May 19, 2022 at 3:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > This happens after "ALTER SUBSCRIPTION sub1 SET PUBLICATION pub9". The
> > probable theory is that ALTER SUBSCRIPTION will lead to restarting of
> > apply worker (which we can see in LOGS as well) and after the restart,

Yes.

> > the apply worker will use the existing slot and replication origin
> > corresponding to the subscription. Now, it is possible that before
> > restart the origin has not been updated and the WAL start location
> > points to a location prior to where PUBLICATION pub9 exists which can
> > lead to such an error. Once this error occurs, apply worker will never
> > be able to proceed and will always return the same error. Does this
> > make sense?

Wow. I didin't thought that line. That theory explains the silence and
makes sense even though I don't see LSN transistions that clearly
support it. I dimly remember a similar kind of problem..

> > Unless you or others see a different theory, this seems to be the
> > existing problem in logical replication which is manifested by this
> > test. If we just want to fix these test failures, we can create a new
> > subscription instead of altering the existing publication to point to
> > the new publication.
> >
>
> If the above theory is correct then I think allowing the publisher to
> catch up with "$node_publisher->wait_for_catchup('sub1');" before
> ALTER SUBSCRIPTION should fix this problem. Because if before ALTER
> both publisher and subscriber are in sync then the new publication
> should be visible to WALSender.

It looks right to me. That timetravel seems inintuitive but it's the
(current) way it works.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-05-20 01:42:31 Re: 15beta1 test failure on mips in isolation/expected/stats
Previous Message Peter Smith 2022-05-20 01:20:47 Re: Skipping schema changes in publication