From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Fix slot synchronization with two_phase decoding enabled |
Date: | 2025-06-03 05:35:07 |
Message-ID: | CABdArM7LzahGw0aZd4AUWmMdi8m7teuEdZrgG0oKDYXXfGdXSg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jun 2, 2025 at 10:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, May 30, 2025 at 3:00 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > Agree that we need to cover the simple pg_dump and pg_restore with the patch.
> >
> > When pg_dump and pg_restore are used outside of pg_upgrade, there's no
> > guarantee that the target system does not have any prepared
> > transactions. In such cases, restoring a subscription with both
> > two_phase and failover enabled could lead to the bug, so we should
> > avoid allowing both options via pg_restore.
> >
> > Here are a few possible solutions:
> >
> > 1) Split the CREATE SUBSCRIPTION command of such subscriptions in pg_restore :
> > first create the sub with two_phase:
> > - CREATE SUBSCRIPTION ... with (two_phase = on)
> > then enable failover
> > - ALTER SUBSCRIPTION ... with (failover=true)
> >
>
> This won't work because we always dump a subscription with the
> 'connect' option as 'false'. As changing the subscription's failover
> option, needs the connection above split won't work.
>
> > However, this approach adds complexity in pg_restore and may still
> > fail if the slot's restart_lsn is earlier than two_phase_at.
> >
> > 2) Prevent dump of subscription in non-upgrade mode:
> > Raise an error/warning during pg_dump when a subscription with both
> > two_phase and failover is encountered (unless in binary upgrade mode).
> > That is, either fail the pg_dump, or skip dumping subscriptions in
> > such a case with a warning.
> > We can include a helpful hint:
> > "Disable failover for subscription <sub-name> before dumping and
> > re-enable it after restore."
> >
> > 3) Dump such subscriptions with (two_phase=on, failover=on,
> > create_slot=false) together. As pg_dump always sets "connect=false"
> > for subscription, it is up to users to reactivate the subscription
> > suitably. In this case, we can add some document to warn that it's the
> > user's responsibility to ensure the remote slot is in a safe state.
> >
> > What do you think?
> >
> > Please let me know if you have any other suggestions for handling this
> > in pg_dump/pg_restore.
> >
>
> Yet another idea is to dump the subscription with two_phase = on and
> failover = false. We should do this when both options are 'true'
> during the dump. As we are documenting that we always dump
> createsubscription with connect as false and let users take care (see
> [1] (When dumping logical replication subscriptions ...)), a similar
> reasoning could be given for the failover flag.
>
+1
> The one more combination to consider is when someone takes a dump of
> an older version and loads it into a newer version. For example, where
> users dump from 17.5 and then restore in a newer version, say 17.6
> (which has our fix), the restore will fail due to newer restrictions
> added by this patch. Do we need to do anything about it?
>
> [1] : https://www.postgresql.org/docs/devel/app-pgdump.html
>
~~~
Attached v17 patches. Added a top-up patch 0002 implementing the idea
suggested by Amit above.
Thanks,
Nisha
Attachment | Content-Type | Size |
---|---|---|
v17-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patch | application/octet-stream | 24.2 KB |
v17-0002-Fix-for-pg_dump.patch | application/octet-stream | 3.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-06-03 05:54:56 | Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding |
Previous Message | Alexander Lakhin | 2025-06-03 05:00:01 | Re: Non-reproducible AIO failure |