Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-11-30 03:52:40
Message-ID: CAA4eK1K228Z9TV4hGH8b9p86s+wGvE55P5Htz4jL7ie2tkSwWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 30, 2023 at 8:40 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Nov 29, 2023 at 2:56 PM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > > > >
> > > > > Pushed!
> > > >
> > > > Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> > > > the last patch committed. Is there further work that needs to be
> > > > re-attached and/or rebased?
> > > >
> > >
> > > No. I have marked it as committed.
> > >
> >
> > I found another failure related with the commit [1]. I think it is caused by the
> > autovacuum. I want to propose a patch which disables the feature for old publisher.
> >
> > More detail, please see below.
> >
> > # Analysis of the failure
> >
> > Summary: this failure occurs when the autovacuum starts after the subscription
> > is disabled but before doing pg_upgrade.
> >
> > According to the regress file, it unexpectedly failed the pg_upgrade [2]. There are
> > no possibilities for slots are invalidated, so some WALs seemed to be generated
> > after disabling the subscriber.
> >
> > Also, server log caused by oldpub said that autovacuum worker was terminated when
> > it stopped. This was occurred after walsender released the logical slots. WAL records
> > caused by autovacuum workers could not be consumed by the slots, so that upgrading
> > function returned false.
> >
> > # How to reproduce
> >
> > I made a small file for reproducing the failure. Please see reproduce.txt. This contains
> > changes for launching autovacuum worker very often and for ensuring actual works are
> > done. After applying it, I could reproduce the same failure every time.
> >
> > # How to fix
> >
> > I think it is sufficient to fix only the test code.
> > The easiest way is to disable the autovacuum on old publisher. PSA the patch file.
> >
>
> Agreed, for now, we should change the test as you proposed. I'll take
> care of that. However, I wonder, if we should also ensure that
> autovacuum or any other worker is shut down before walsender processes
> the last set of WAL before shutdown. We can analyze more on this and
> probably start a separate thread to discuss this point.
>

Sorry, my analysis was not complete. On looking closely, I think the
reason is that we are allowed to upgrade the slot iff there is no
pending WAL to be processed. The test first disables the subscription
to avoid unnecessary LOGs on the subscriber and then stops the
publisher node. It is quite possible that just before the shutdown of
the server, autovacuum generates some WAL record that needs to be
processed, so you propose just disabling the autovacuum for this test.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-11-30 04:03:25 Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Previous Message Jingxian Li 2023-11-30 03:43:19 Re: [PATCH] LockAcquireExtended improvement