Re: pg_upgrade and logical replication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_upgrade and logical replication
Date: 2023-11-01 18:44:05
Message-ID: CALDaNm3ATayLn=YNO5PsXkKF15R_E8j-jEu5gGkv=Od8TBxzEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 27 Oct 2023 at 17:05, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Oct 27, 2023 at 12:09 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > Apart from this I'm still checking that the old cluster's subscription
> > relations states are READY state still, but there is a possibility
> > that SYNCDONE or FINISHEDCOPY could work, this needs more thought
> > before concluding which is the correct state to check. Let' handle
> > this in the upcoming version.
> >
>
> I was analyzing this part and it seems it could be tricky to upgrade
> in FINISHEDCOPY state. Because the system would expect that subscriber
> would know the old slotname from oldcluster which it can drop at
> SYNCDONE state. Now, as sync_slot_name is generated based on subid,
> relid which could be different in the new cluster, the generated
> slotname would be different after the upgrade. OTOH, if the relstate
> is INIT, then I think the sync could be performed even after the
> upgrade.

I had analyzed all the subscription relation states further, here is
my analysis:
The following states are ok, as either the replication slot is not
created or the replication slot is already dropped and the required
WAL files will be present in the publisher:
a) SUBREL_STATE_SYNCDONE b) SUBREL_STATE_READY c) SUBREL_STATE_INIT
The following states are not ok as the worker has dependency on the
replication slot/origin in these case:
a) SUBREL_STATE_DATASYNC: In this case, the table sync worker will try
to drop the replication slot but as the replication slots will be
created with old subscription id in the publisher and the upgraded
subscriber will not be able to clean the slots in this case. b)
SUBREL_STATE_FINISHEDCOPY: In this case, the tablesync worker will
expect the origin to be already existing as the origin is created with
an old subscription id, tablesync worker will not be able to find the
origin in this case. c) SUBREL_STATE_SYNCWAIT, SUBREL_STATE_CATCHUP
and SUBREL_STATE_UNKNOWN: These states are not stored in the catalog,
so we need not allow these states.
I modified it to support the relation states accordingly.

> Shouldn't we at least ensure that replication origins do exist in the
> old cluster corresponding to each of the subscriptions? Otherwise,
> later the query to get remote_lsn for origin in getSubscriptions()
> would fail.
Added a check for the same.

The attached v10 version patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v10-0001-Prevent-startup-of-logical-replication-launcher-.patch text/x-patch 2.0 KB
v10-0002-Preserve-the-full-subscription-s-state-during-pg.patch text/x-patch 37.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Morris 2023-11-01 19:47:24 Re: Where can I find the doxyfile?
Previous Message Bruce Momjian 2023-11-01 17:57:18 Re: Confused about stream replication protocol documentation