Re: pg_upgrade and logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_upgrade and logical replication
Date: 2023-11-20 04:19:41
Message-ID: CAA4eK1J6D5OS6KCUjUbeQetjjCXsOABZ0=vTTkL6t2yfFL6A_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 14, 2023 at 7:21 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 13 Nov 2023 at 13:52, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > Anyway, after a closer lookup, I think that your conclusions regarding
> > the states that are allowed in the patch during the upgrade have some
> > flaws.
> >
> > First, are you sure that SYNCDONE is OK to keep? This catalog state
> > is set in process_syncing_tables_for_sync(), and just after the code
> > opens a transaction to clean up the tablesync slot, followed by a
> > second transaction to clean up the origin. However, imagine that
> > there is a failure in dropping the slot, the origin, or just in
> > transaction processing, cannot we finish in a state where the relation
> > is marked as SYNCDONE in the catalog but still has an origin and/or a
> > tablesync slot lying around? Assuming that SYNCDONE is an OK state
> > seems incorrect to me. I am pretty sure that injecting an error in a
> > code path after the slot is created would equally lead to an
> > inconsistency.
>
> There are couple of things happening here: a) In the first part we
> take care of setting subscription relation to SYNCDONE and dropping
> the replication slot at publisher node, only if drop replication slot
> is successful the relation state will be set to SYNCDONE , if drop
> replication slot fails the relation state will still be in
> FINISHEDCOPY. So if there is a failure in the drop replication slot we
> will not have an issue as the tablesync worker will be in
> FINISHEDCOPYstate and this state is not allowed for upgrade. When the
> state is in SYNCDONE the tablesync slot will not be present. b) In the
> second part we drop the replication origin, even if there is a chance
> that drop replication origin fails due to some reason, there will be
> no problem as we do not copy the table sync replication origin to the
> new cluster while upgrading. Since the table sync replication origin
> is not copied to the new cluster there will be no replication origin
> leaks.
>

And, this will work because in the SYNCDONE state, while removing the
origin, we are okay with missing origins. It seems not copying the
origin for tablesync workers in this state (SYNCDONE) relies on the
fact that currently, we don't use those origins once the system
reaches the SYNCDONE state but I am not sure it is a good idea to have
such a dependency and that upgrade assuming such things doesn't seems
ideal to me. Personally, I think allowing an upgrade in 'i'
(initialize) state or 'r' (ready) state seems safe because in those
states either slots/origins don't exist or are dropped. What do you
think?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2023-11-20 04:29:53 Re: generic plans and "initial" pruning
Previous Message 邱宇航 2023-11-20 03:33:22 Re: Transaction timeout