Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-08-09 04:15:15
Message-ID: CAA4eK1K1TQcNrfvJ9PpKVu5XnRzzxVdNE9mCJ56P-25tFietJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 9, 2023 at 8:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Aug 7, 2023 at 6:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Aug 7, 2023 at 2:02 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > WAL records for hint bit updates could be generated even in upgrading mode?
> > >
> >
> > Do you mean these records can be generated during reading catalog tables?
>
> Yes.
>

BTW, Kuroda-San has verified and found that three types of records
(including XLOG_FPI_FOR_HINT) can be generated by the system during
the upgrade. See email [1].

> >
> > > > I feel if there is a chance of any WAL activity during the
> > > > upgrade, we need to either change the check to ensure such WAL records
> > > > are expected or document the same in some way.
> > >
> > > Yes, but how does it work with the above idea of limiting the number
> > > of WAL records to read? If XLOG_FPI_FOR_HINT can still be generated in
> > > the upgrade mode, we cannot predict how many such records are
> > > generated after the latest CHECKPOINT_SHUTDOWN.
> > >
> >
> > Right, as said earlier, in that case, we need to rely on the type of records.
>
> Another idea would be that before starting the old cluster we check if
> the slot's confirmed_flush_lsn in the slot state file matches the
> latest checkpoint LSN got by pg_controlfile. We need another tool to
> dump the slot state file, though.
>

I feel it would be a good idea to provide such a tool for users to
avoid getting errors during upgrade but I think the upgrade code still
needs to ensure that there are no WAL records between
confirm_flush_lsn and SHUTDOWN_CHECKPOINT than required. Or, do you
want to say that we don't do any verification check during the upgrade
and let the data loss happens if the user didn't ensure that by
running such a tool?

> >
> > > I'm not really sure we should always perform the slot's
> > > confirmed_flush_lsn check by default in the first place. With this
> > > check, the upgrade won't be able to proceed if there is any logical
> > > slot that is not used by logical replication (or something streaming
> > > the changes using walsender), right? For example, if a user uses a
> > > program that periodically consumes the changes from the logical slot,
> > > the slot would not be able to pass the check even if the user executed
> > > pg_logical_slot_get_changes() just before shutdown. The backend
> > > process who consumes the changes is always terminated before the
> > > shutdown checkpoint. On the other hand, I think there are cases where
> > > the user can ensure that no meaningful WAL records are generated after
> > > the last pg_logical_slot_get_changes(). I'm concerned that this check
> > > might make upgrading such cases cumbersome unnecessarily.
> > >
> >
> > You are right and I have mentioned the same case today in my response
> > to Jonathan but do you have better ideas to deal with such slots than
> > to give an ERROR?
>
> It makes sense to me to give an ERROR for such slots but does it also
> make sense to make the check optional?
>

We can do that if we think so. We have two ways to make this check
optional (a) have a switch like --include-logical-replication-slots as
the proposed patch has which means by default we won't try to upgrade
slots; (b) have a switch like --exclude-logical-replication-slots as
Jonathan proposed which means we will exclude slots only if specified
by user. Now, one thing to note is that we don't seem to have any
include/exclude switch in the upgrade which I think indicates users by
default prefer to upgrade everything. Now, even if we decide not to
give any switch initially but do it only if there is a user demand for
it then also users will have a way to proceed with an upgrade which is
by dropping such slots. Do you have any preference?

[1] - https://www.postgresql.org/message-id/TYAPR01MB58660273EACEFC5BF256B133F50DA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2023-08-09 04:21:03 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message Peter Geoghegan 2023-08-09 04:07:58 Re: pg15: reltuples stuck at -1 after pg_upgrade and VACUUM