Re: pg_upgrade and logical replication

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_upgrade and logical replication
Date: 2023-02-26 03:05:18
Message-ID: 20230226030518.q6tggrid2vacmvlb@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 25, 2023 at 11:24:17AM +0530, Amit Kapila wrote:
> On Wed, Feb 22, 2023 at 12:13 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >
> > > Is there really a use case for dumping the content of pg_subscription_rel
> > > outside of pg_upgrade?
>
> I think the users who want to take a dump and restore the entire
> cluster may need it there for the same reason as pg_upgrade needs it.
> TBH, I have not seen such a request but this is what I imagine one
> would expect if we provide this functionality via pg_upgrade.

But the pg_subscription_rel data are only needed if you want to resume logical
replication from the exact previous state, otherwise you can always refresh the
subscription and it will retrieve the list of relations automatically (dealing
with initial sync and so on). It's hard to see how it could be happening with
a plain pg_dump.

The only usable scenario I can see would be to disable all subscriptions on the
logical replica, maybe make sure that no one does any write those tables if you
want to eventually switch over on the restored node, do a pg_dump(all), restore
it and then resume the logical replication / subscription(s) on the restored
server. That's a lot of constraints for something that pg_upgrade deals with
so much more efficiently. Maybe one plausible use case would be to split a
single logical replica to N servers, one per database / publication or
something like that. In that case pg_upgrade won't be that useful and if each
target subset is small enough a pg_dump/pg_restore may be a viable option. But
if that's a viable option then surely creating the logical replica from scratch
using normal logical table sync should be an even better option.

I'm really worried that it's going to be a giant foot-gun that any user should
really avoid.

> > > For the pg_upgrade use-case, do you see any reason to not restore the
> > > pg_subscription_rel by default?
>
> As I said earlier, one can very well say that giving more flexibility
> (in terms of where the publications will be after restore) after a
> restore is a better idea. Also, we are doing the same till now without
> any major complaints about the same, so it makes sense to keep the
> current behavior as default.

I'm a bit dubious that anyone actually tried to run pg_upgrade on a logical
replica and then kept using logical replication, as it's currently impossible
to safely resume replication without truncating all target relations.

As I mentioned before, if we keep the current behavior as a default there
should be an explicit warning in the documentation stating that you need to
truncate all target relations before resuming logical replication as otherwise
you have a guarantee that you will lose data.

> > > Maybe having an option to not restore it would
> > > make sense if it indeed add noticeable overhead when publications have a lot of
> > > tables?
>
> Yeah, that could be another reason to not do it default.

I will do some benchmark with various number of relations, from high to
unreasonable.

> >
> > Since I didn't hear any objection I worked on a POC patch with this approach.
> >
> > For now when pg_dump is invoked with --binary, it will always emit extra
> > commands to restore the relation list. This command is only allowed when the
> > server is started in binary upgrade mode.
> >
> > The new command is of the form
> >
> > ALTER SUBSCRIPTION name ADD TABLE (relid = X, state = 'Y', lsn = 'Z/Z')
> >
> > with the lsn part being optional.
> >
>
> BTW, do we restore the origin and its LSN after the upgrade? Because
> without that this won't be sufficient as that is required for apply
> worker to ensure that it is in sync with table sync workers.

We currently don't, which is yet another sign that no one actually tried to
resume logical replication after a pg_upgrade. That being said, trying to
pg_upgrade a node that's currently syncing relations seems like a bad idea
(I didn't even think to try), but I guess it should also be supported. I will
work on that too. Assuming we add a new option for controlling either plain
pg_dump and/or pg_upgrade behavior, should this option control both
pg_subscription_rel and replication origins and their data or do we need more
granularity?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-02-26 04:58:17 Re: verbose mode for pg_input_error_message?
Previous Message Justin Pryzby 2023-02-26 01:22:27 Re: zstd compression for pg_dump