Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-08-10 13:37:04
Message-ID: CAD21AoCzY5MeP8VaqctGcMPeZra4R-SQA+-_Zy5N6XtvTpjPGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 10, 2023 at 12:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Aug 10, 2023 at 6:46 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Aug 9, 2023 at 1:15 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Aug 9, 2023 at 8:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I feel it would be a good idea to provide such a tool for users to
> > > avoid getting errors during upgrade but I think the upgrade code still
> > > needs to ensure that there are no WAL records between
> > > confirm_flush_lsn and SHUTDOWN_CHECKPOINT than required. Or, do you
> > > want to say that we don't do any verification check during the upgrade
> > > and let the data loss happens if the user didn't ensure that by
> > > running such a tool?
> >
> > I meant that if we can check the slot state file while the old cluster
> > stops, we can ensure there are no WAL records between slot's
> > confirmed_fluhs_lsn (in the state file) and the latest checkpoint (in
> > the control file).
> >
>
> Are you suggesting doing this before we start the old cluster or after
> we stop the old cluster? I was thinking about the pros and cons of
> doing this check when the server is 'on' (along with other upgrade
> checks something like the patch is doing now) versus when the server
> is 'off'. I think the advantage of doing it when the server is 'off'
> (after check_and_dump_old_cluster()) is that it will be ensured that
> there is no extra WAL that could be generated during the upgrade and
> has not been verified against confirmed_flush_lsn location. But OTOH,
> to retrieve slot information when the server is 'off', we need a
> separate utility or probably a functionality for the same in
> pg_upgrade and also some WAL reading stuff which sounds to me like a
> larger change that may not be warranted here. I think anyway the extra
> WAL (if any got generated during the upgrade) won't be required after
> the upgrade so not convinced to make such a check while the server is
> 'off'. Are there reasons which make it better to do this while the old
> cluster is 'off'?

What I imagined is that we do this check before
check_and_dump_old_cluster() while the server is 'off'. Reading the
slot state file would be simple and I guess we would not need a tool
or cli program for that. We need to expose RepliactionSlotOnDisk,
though. After reading the control file and the slots' state files we
check if slot's confirmed_flush_lsn matches the latest checkpoint LSN
in the control file (BTW maybe we can get slot name and plugin name
here instead of using pg_dump?). Extra WAL records could be generated
only after this check, so we wouldn't need to worry about that for
slots for logical replication. As for non-logical replication slots,
we would need some WAL reading stuff, but I'm not sure we need it for
the first commit. Or another idea would be to allow users to mark
replication slots "upgradable" so that pg_upgrade skips the
confirmed_flush_lsn check.

BTW this check would not be able to support live-check but I think
it's not a problem as this check with a running server will never be
able to pass.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2023-08-10 14:31:27 Re: proposal: psql: show current user in prompt
Previous Message Jimmy Angelakos 2023-08-10 12:47:09 Re: Adding a pg_servername() function