RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-08-15 04:13:49
Message-ID: OS0PR01MB5716CE2A338136DD1129D2AC9414A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, August 15, 2023 11:06 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> >
> > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > Another idea is (which might have already discussed thoguh) that we
> check if the latest shutdown checkpoint LSN in the control file matches the
> confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that
> the slot has consumed all WAL records before the last shutdown. We don't
> need to worry about WAL records generated after starting the old cluster
> during the upgrade, at least for logical replication slots.
> > > >
> > >
> > > Right, this is somewhat closer to what Patch is already doing. But
> > > remember in this case we need to remember and use the latest
> > > checkpoint from the control file before the old cluster is started
> > > because otherwise the latest checkpoint location could be even
> > > updated during the upgrade. So, instead of reading from WAL, we need
> > > to change so that we rely on the control file's latest LSN.
> >
> > Yes, I was thinking the same idea.
> >
> > But it works for only replication slots for logical replication. Do we
> > want to check if no meaningful WAL records are generated after the
> > latest shutdown checkpoint, for manually created slots (or non-logical
> > replication slots)? If so, we would need to have something reading WAL
> > records in the end.
> >
>
> > > I would prefer this
> > > idea than to invent a new API/tool like pg_replslotdata.
> >
> > +1

Changed the check to compare the latest checkpoint lsn from pg_controldata
with the confirmed_flush_lsn in pg_replication_slots view.

> >
> > >
> > > The other point you and Bruce seem to be favoring is that instead of
> > > dumping/restoring slots via pg_dump, we remember the required
> > > information of slots retrieved during their validation in pg_upgrade
> > > itself and use that to create the slots in the new cluster. Though I
> > > am not aware of doing similar treatment for other objects we restore
> > > in this case it seems reasonable especially because slots are not
> > > stored in the catalog and we anyway already need to retrieve the
> > > required information to validate them, so trying to again retrieve
> > > it via pg_dump doesn't seem useful unless I am missing something.
> > > Does this match your understanding?
> >
> > If there are use cases for --logical-replication-slots-only option
> > other than pg_upgrade, it would be good to have it in pg_dump. I was
> > just not sure of other use cases.
> >
>
> It was primarily for upgrade purposes only. So, as we can't see a good reason to
> go via pg_dump let's do it in upgrade unless someone thinks otherwise.

Removed the new option in pg_dump and modified the pg_upgrade
directly use the slot info to restore the slot in new cluster.

>
> > >
> > > Yet another thing I am trying to consider is whether we can allow to
> > > upgrade slots from 16 or 15 to later versions. As of now, the patch
> > > has the following check:
> > > getLogicalReplicationSlots()
> > > {
> > > ...
> > > + /* Check whether we should dump or not */ if (fout->remoteVersion
> > > + < 170000) return;
> > > ...
> > > }
> > >
> > > If we decide to use the existing view pg_replication_slots then can
> > > we consider upgrading slots from the prior version to 17? Now, if we
> > > want to invent any new API similar to pg_replslotdata then we can't
> > > do this because it won't exist in prior versions but OTOH using
> > > existing view pg_replication_slots can allow us to fetch slot info
> > > from older versions as well. So, I think it is worth considering.
> >
> > I think that without 0001 patch the replication slots will not be able
> > to pass the confirmed_flush_lsn check.
> >
>
> Right, but we can think of backpatching the same. Anyway, we can do that as a
> separate work by starting a new thread to see if there is a broader agreement
> for backpatching such a change. For now, we can focus on >=v17.
>

Here is the new version patch which addressed above points.
The new version patch also removes the --exclude-logical-replication-slots
option due to recent comment.
Thanks Kuroda-san for addressing most of the points.

Best Regards,
Hou zj

Attachment Content-Type Size
v20-0003-pg_upgrade-Add-check-function-for-logical-replic.patch application/octet-stream 14.9 KB
v20-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch application/octet-stream 4.9 KB
v20-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch application/octet-stream 21.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiro Ikeda 2023-08-15 05:13:46 Re: Fix pg_stat_reset_single_table_counters function
Previous Message Nathan Bossart 2023-08-15 04:11:29 Re: [PATCH] Add function to_oct