Re: Fix slot synchronization with two_phase decoding enabled

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Date: 2025-06-02 05:25:09
Message-ID: CAA4eK1KiGU5Zr_S2g8w-8dUm-PRiZc72BgMnLdS83u7sgnctcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 30, 2025 at 3:00 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>
> Agree that we need to cover the simple pg_dump and pg_restore with the patch.
>
> When pg_dump and pg_restore are used outside of pg_upgrade, there's no
> guarantee that the target system does not have any prepared
> transactions. In such cases, restoring a subscription with both
> two_phase and failover enabled could lead to the bug, so we should
> avoid allowing both options via pg_restore.
>
> Here are a few possible solutions:
>
> 1) Split the CREATE SUBSCRIPTION command of such subscriptions in pg_restore :
> first create the sub with two_phase:
> - CREATE SUBSCRIPTION ... with (two_phase = on)
> then enable failover
> - ALTER SUBSCRIPTION ... with (failover=true)
>

This won't work because we always dump a subscription with the
'connect' option as 'false'. As changing the subscription's failover
option, needs the connection above split won't work.

> However, this approach adds complexity in pg_restore and may still
> fail if the slot's restart_lsn is earlier than two_phase_at.
>
> 2) Prevent dump of subscription in non-upgrade mode:
> Raise an error/warning during pg_dump when a subscription with both
> two_phase and failover is encountered (unless in binary upgrade mode).
> That is, either fail the pg_dump, or skip dumping subscriptions in
> such a case with a warning.
> We can include a helpful hint:
> "Disable failover for subscription <sub-name> before dumping and
> re-enable it after restore."
>
> 3) Dump such subscriptions with (two_phase=on, failover=on,
> create_slot=false) together. As pg_dump always sets "connect=false"
> for subscription, it is up to users to reactivate the subscription
> suitably. In this case, we can add some document to warn that it's the
> user's responsibility to ensure the remote slot is in a safe state.
>
> What do you think?
>
> Please let me know if you have any other suggestions for handling this
> in pg_dump/pg_restore.
>

Yet another idea is to dump the subscription with two_phase = on and
failover = false. We should do this when both options are 'true'
during the dump. As we are documenting that we always dump
createsubscription with connect as false and let users take care (see
[1] (When dumping logical replication subscriptions ...)), a similar
reasoning could be given for the failover flag.

The one more combination to consider is when someone takes a dump of
an older version and loads it into a newer version. For example, where
users dump from 17.5 and then restore in a newer version, say 17.6
(which has our fix), the restore will fail due to newer restrictions
added by this patch. Do we need to do anything about it?

[1] : https://www.postgresql.org/docs/devel/app-pgdump.html

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-06-02 05:30:58 Re: Virtual generated columns
Previous Message Paul Jungwirth 2025-06-02 05:24:44 SQL:2011 Application Time Update & Delete