Re: pg_upgrade and logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_upgrade and logical replication
Date: 2023-02-17 10:42:54
Message-ID: CAA4eK1+wo2HyaPRNYjQcMuNuX4_4nR6amxZ4oj8gUu-DHU8CAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 17, 2023 at 1:24 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> I was working on testing a major upgrade scenario using a mix of physical and
> logical replication when I faced some unexpected problem leading to missing
> rows. Note that my motivation is to rely on physical replication / physical
> backup to avoid recreating a node from scratch using logical replication, as
> the initial sync with logical replication is much more costly and impacting
> compared to pg_basebackup / restoring a physical backup, but the same problem
> exist if you just pg_upgrade a node that has subscriptions.
>
> The problem is that pg_upgrade creates the subscriptions on the newly upgraded
> node using "WITH (connect = false)", which seems expected as you obviously
> don't want to try to connect to the publisher at that point. But then once the
> newly upgraded node is restarted and ready to replace the previous one, unless
> I'm missing something there's absolutely no possibility to use the created
> subscriptions without losing some data from the publisher.
>
> The reason is that the subscription doesn't have a local list of relation to
> process until you refresh the subscription, but you can't refresh the
> subscription without enabling it (and you can't enable it in a transaction),
> which means that you have to let the logical worker start, consume and ignore
> all changes that happened on the publisher side until the refresh happens.
>
> An easy workaround that I tried is to allow something like
>
> ALTER SUBSCRIPTION ... ENABLE WITH (refresh = true, copy_data = false)
>
> so that the refresh internally happens before the apply worker is started and
> you just keep consuming the delta, which works on naive scenario.
>
> One concern I have with this approach is that the default values for both
> "refresh" and "copy_data" for all other subcommands is "true, but we would
> probably need a different default value in that exact scenario (as we know we
> already have the data). I think that it would otherwise be safe in my very
> specific scenario, assuming that you created the slot beforehand and moved the
> slot's LSN at the promotion point, as even if you add non-empty tables to the
> publication you will only need the delta whether those were initially empty or
> not given your initial physical replica state.
>

This point is not very clear. Why would one just need delta even for new tables?

> Any other scenario would make
> this new option dangerous, if not entirely useless, but not more than any of
> the current commands that lead to refreshing a subscription and have the same
> options I guess.
>
> All in all, currently the only way to somewhat safely resume logical
> replication after a pg_upgrade is to drop all the subscriptions that were
> transferred during pg_upgrade on all databases and recreate them (using the
> existing slots on the publisher side obviously), allowing the initial
> connection. But this approach only works in the exact scenario I mentioned
> (physical to logical replication, or at least a case where *all* the tables
> where logically replicated prior to the pg_ugprade), otherwise you have to
> recreate the follower node from scratch using logical repication.
>

I think if you dropped and recreated the subscriptions by retaining
old slots, the replication should resume from where it left off before
the upgrade. Which scenario are you concerned about?

> Is that indeed the current behavior, or did I miss something?
>
> Is this "resume logical replication on pg_upgraded node" something we want to
> support better? I was thinking that we could add a new pg_dump mode (maybe
> only usable during pg_upgrade) that also restores the pg_subscription_rel
> content in each subscription or something like that. If not, should pg_upgrade
> keep preserving the subscriptions as it doesn't seem safe to use them, or at
> least document the hazards (I didn't find anything about it in the
> documentation)?
>
>

There is a mention of this in pg_dump docs. See [1] (When dumping
logical replication subscriptions ...)

[1] - https://www.postgresql.org/docs/devel/app-pgdump.html

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2023-02-17 10:53:36 Re: Refactor calculations to use instr_time
Previous Message vignesh C 2023-02-17 10:22:54 The output sql generated by pg_dump for a create function refers to a modified table name