Re: pg_upgrade and logical replication

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: pg_upgrade and logical replication
Date: 2023-11-13 08:22:06
Message-ID: ZVHcruo184qnHlmz@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 10, 2023 at 07:26:18PM +0530, vignesh C wrote:
> I did testing in the same lines that you mentioned. Apart from that I
> also reviewed the design where it was using the old subscription id
> like in case of table sync workers, the tables sync worker will use
> replication using old subscription id. replication slot and
> replication origin. I also checked the impact of remote_lsn's.
> Few example: IN SUBREL_STATE_DATASYNC state we will try to drop the
> replication slot once worker is started but since the slot will be
> created with an old subscription, we will not be able to drop the
> replication slot and create a leak. Similarly the problem exists with
> SUBREL_STATE_FINISHEDCOPY where we will not be able to drop the origin
> created with an old sub id.

Yeah, I was playing a bit with these states and I can confirm that
leaving around a DATASYNC relation in pg_subscription_rel during
the upgrade would leave a slot on the publisher of the old cluster,
which is no good. It would be an option to explore later what could
be improved, but I'm also looking forward at hearing from the users
first, as what you have here may be enough for the basic purposes we
are trying to cover. FINISHEDCOPY similarly, is not OK. I was able
to get an origin lying around after an upgrade.

Anyway, after a closer lookup, I think that your conclusions regarding
the states that are allowed in the patch during the upgrade have some
flaws.

First, are you sure that SYNCDONE is OK to keep? This catalog state
is set in process_syncing_tables_for_sync(), and just after the code
opens a transaction to clean up the tablesync slot, followed by a
second transaction to clean up the origin. However, imagine that
there is a failure in dropping the slot, the origin, or just in
transaction processing, cannot we finish in a state where the relation
is marked as SYNCDONE in the catalog but still has an origin and/or a
tablesync slot lying around? Assuming that SYNCDONE is an OK state
seems incorrect to me. I am pretty sure that injecting an error in a
code path after the slot is created would equally lead to an
inconsistency.

It seems to me that INIT cannot be relied on for a similar reason.
This state would be set for a new relation in
LogicalRepSyncTableStart(), and the relation would still be in INIT
state when creating the slot via walrcv_create_slot() in a second
transaction started a bit later. However, if we have a failure after
the transaction that created the slot commits, then we'd have an INIT
relation in the catalog that got committed *and* a slot related to it
lying around.

The only state that I can see is possible to rely on safely is READY,
set in the same transaction as when the replication origin is dropped,
because that's the point where we are sure that there are no origin
and no tablesync slot: the READY state is visible in the catalog only
if the transaction dropping the slot succeeds. Even with this one, I
was having the odd feeling that there's a code path where we could
leak something, though I have not seen a problem with after a few
hours of looking at this area.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii.Yuki@df.MitsubishiElectric.co.jp 2023-11-13 08:25:48 RE: Partial aggregates pushdown
Previous Message Peter Smith 2023-11-13 08:21:39 Re: pg_upgrade and logical replication