Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-11-30 03:10:28
Message-ID: CAA4eK1JVg9Kv=-1_Do9K5xWR-pUD6bpS38FOfQZ=smvOHnKErQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 29, 2023 at 2:56 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> > > >
> > > > Pushed!
> > >
> > > Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> > > the last patch committed. Is there further work that needs to be
> > > re-attached and/or rebased?
> > >
> >
> > No. I have marked it as committed.
> >
>
> I found another failure related with the commit [1]. I think it is caused by the
> autovacuum. I want to propose a patch which disables the feature for old publisher.
>
> More detail, please see below.
>
> # Analysis of the failure
>
> Summary: this failure occurs when the autovacuum starts after the subscription
> is disabled but before doing pg_upgrade.
>
> According to the regress file, it unexpectedly failed the pg_upgrade [2]. There are
> no possibilities for slots are invalidated, so some WALs seemed to be generated
> after disabling the subscriber.
>
> Also, server log caused by oldpub said that autovacuum worker was terminated when
> it stopped. This was occurred after walsender released the logical slots. WAL records
> caused by autovacuum workers could not be consumed by the slots, so that upgrading
> function returned false.
>
> # How to reproduce
>
> I made a small file for reproducing the failure. Please see reproduce.txt. This contains
> changes for launching autovacuum worker very often and for ensuring actual works are
> done. After applying it, I could reproduce the same failure every time.
>
> # How to fix
>
> I think it is sufficient to fix only the test code.
> The easiest way is to disable the autovacuum on old publisher. PSA the patch file.
>

Agreed, for now, we should change the test as you proposed. I'll take
care of that. However, I wonder, if we should also ensure that
autovacuum or any other worker is shut down before walsender processes
the last set of WAL before shutdown. We can analyze more on this and
probably start a separate thread to discuss this point.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-11-30 03:39:46 Re: proposal: possibility to read dumped table's name from file
Previous Message Hayato Kuroda (Fujitsu) 2023-11-30 03:09:02 RE: Is this a problem in GenericXLogFinish()?