RE: Slow catchup of 2PC (twophase) transactions on replica in LR

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Давыдов Виталий <v(dot)davydov(at)postgrespro(dot)ru>
Subject: RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-05-14 12:03:49
Message-ID: OSBPR01MB2552F66463EFCFD654E87C09F5E32@OSBPR01MB2552.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Peter,

Thanks for reviewing! Here is new version patch.

> //////////
> Patch v9-0002
> //////////
>
> ======
> Commit Message
>
> 2.1.
> Regarding the off->on case, the logical replication already has a
> mechanism for it, so there is no need to do anything special for the
> on->off case; however, we must connect to the publisher and expressly
> change the parameter. The operation cannot be rolled back, and
> altering the parameter from "on" to "off" within a transaction is
> prohibited.
>
> In the opposite case, there is no need to prevent this because the
> logical replication worker already had the mechanism to alter the slot
> option at a convenient time.
>
> ~
>
> This explanation seems to be going around in circles, without giving
> any new information:
>
> AFAICT, "Regarding the off->on case, the logical replication already
> has a mechanism for it, so there is no need to do anything special for
> the on->off case;"
>
> is saying pretty much the same as:
>
> "In the opposite case, there is no need to prevent this because the
> logical replication worker already had the mechanism to alter the slot
> option at a convenient time."
>
> But, what I hoped for in previous review comments was an explanation
> somewhat less vague than "already has a mechanism" or "already had the
> mechanism". Can't this have just 1 or 2 lines to say WHAT is that
> existing mechanism for the "off" to "on" case, and WHY that means
> there is nothing special to do in that scenario?
>

Reworded. Thought?

> 2.2. AlterSubscription
>
> /*
> - * The changed two_phase option of the slot can't be rolled
> - * back.
> + * Since altering the two_phase option of subscriptions
> + * also leads to changing the slot option, this command
> + * cannot be rolled back. So prevent this if we are in a
> + * transaction block. In the opposite case, there is no
> + * need to prevent this because the logical replication
> + * worker already had the mechanism to alter the slot
> + * option at a convenient time.
> */
>
> (Same previous review comments, and same as my review comment for the
> commit message above).
>
> I don't think "already had the mechanism" is enough explanation.
>
> Also, the 2nd sentence doesn't make sense here because the comment
> only said "altering the slot option" -- it didn't say it was altering
> it to "on" or altering it to "off", so "the opposite case" has no
> meaning.

Fixed.

> 2.3. AlterSubscription
>
> /*
> - * Try to acquire the connection necessary for altering slot.
> + * Check the need to alter the replication slot. Failover and two_phase
> + * options are controlled by both the publisher (as a slot option) and the
> + * subscriber (as a subscription option). The slot option must be altered
> + * only when changing "on" to "off". Because in opposite case, the logical
> + * replication worker already has the mechanism to do so at a convenient
> + * time.
> + */
> + update_failover = replaces[Anum_pg_subscription_subfailover - 1];
> + update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1]
> &&
> + !opts.twophase);
>
> This is again the same as other review comments above. Probably, when
> some better explanation can be found for "already has the mechanism to
> do so at a convenient time." then all of these places can be changed
> using similar text.

Added a reference.

>
> //////////
> Patch v9-0003
> //////////
>
> There are some imperfect code comments but AFAIK they are the same
> ones from patch 0002. I think patch 0003 is just moving those comments
> to different places, so probably they would already be addressed by
> patch 0002.
>

The comment was moved, so no need to modify here.

> ======
> doc/src/sgml/catalogs.sgml
>
> 4.1.
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>subforcealter</structfield> <type>bool</type>
> + </para>
> + <para>
> + If true, the subscription can be altered <literal>two_phase</literal>
> + option, even if there are prepared transactions
> + </para></entry>
> + </row>
> +
>
> BEFORE
> If true, the subscription can be altered <literal>two_phase</literal>
> option, even if there are prepared transactions
>
> SUGGESTION
> If true, then the ALTER SUBSCRIPTION command can disable
> <literal>two_phase</literal> option, even if there are uncommitted
> prepared transactions from when <literal>two_phase</literal> was
> enabled

Fixed, added a link for ALTER SUBSCRIPTION.

>
> ======
> doc/src/sgml/ref/alter_subscription.sgml
>
> 4.2.
> -
> - <para>
> - The <literal>two_phase</literal> parameter can only be altered when
> the
> - subscription is disabled. When altering the parameter from
> <literal>on</literal>
> - to <literal>off</literal>, the backend process checks for any incomplete
> - prepared transactions done by the logical replication worker (from when
> - <literal>two_phase</literal> parameter was still <literal>on</literal>)
> - and, if any are found, those are aborted.
> - </para>
>
> Well, I still think there ought to be some mention of the relationship
> between 'force_alter' and 'two_phase' given on this ALTER SUBSCRIPTION
> page. Then the user can cross-reference to read what the 'force_alter'
> actually does.
>

Revived the content, and added an link. Thought?

> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 4.3.
> +
> + <varlistentry id="sql-createsubscription-params-with-force-alter">
> + <term><literal>force_alter</literal>
> (<type>boolean</type>)</term>
> + <listitem>
> + <para>
> + Specifies whether the subscription can be altered
> + <literal>two_phase</literal> option, even if there are prepared
> + transactions. If specified, the backend process checks for any
> + incomplete prepared transactions done by the logical replication
> + worker (from when <literal>two_phase</literal> parameter was
> still
> + <literal>on</literal>), if any are found, those are aborted.
> + Otherwise, Altering the parameter from <literal>on</literal> to
> + <literal>off</literal> will give an error when there are prepared
> + transactions done by the logical replication worker.
> + The default is <literal>false</literal>.
> + </para>
> + </listitem>
> + </varlistentry>
>
> This explanation seems a bit repetitive. I think it can be improved as follows:
>
> SUGGESTION
> Specifies if the ALTER SUBSCRIPTION can be forced to proceed instead
> of giving an error.
>
> There is currently only one scenario where this parameter has any
> effect: When altering two_phase option from true to false it is
> possible for there to be incomplete prepared transactions done by the
> logical replication worker (from when two_phase parameter was still
> true). If force_alter is false, then this will give an error; if
> force_alter is true, then the incomplete prepared transactions are
> aborted and the alter will proceed.
>
> The default is false.

Fixed, but added attributes.

>
> ======
> src/backend/commands/subscriptioncmds.c
>
> 4.4. CreateSubscription
>
> values[Anum_pg_subscription_subfailover - 1] = BoolGetDatum(opts.failover);
> + values[Anum_pg_subscription_subforcealter] =
> BoolGetDatum(opts.force_alter);
> values[Anum_pg_subscription_subconninfo - 1] =
>
> Hmm, looks like a bug. Shouldn't that index say -1?
>

Right, fixed.

> ~~~
> 4.5. AlterSubscription
>
> + /*
> + * Abort prepared transactions only if
> + * 'force_alter' option is true. Otherwise raise
> + * an ERROR.
> + */
> + if (IsSet(opts.specified_opts, SUBOPT_FORCE_ALTER))
> + {
> + if (!opts.force_alter)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot alter %s when there are prepared transactions",
> + "two_phase = off"),
> + errhint("Resolve these transactions or set %s, and then try again.",
> + "force_alter = true")));
> + }
> + else
> + {
> + if (!sub->forcealter)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot alter %s when there are prepared transactions",
> + "two_phase = off"),
> + errhint("Resolve these transactions or set %s, and then try again.",
> + "force_alter = true")));
> + }
> +
>
> IIUC this code can be simplified to remove the error duplication.
> Something like below:
>
> SUGGESTION
>
> bool raise_error = IsSet(opts.specified_opts, SUBOPT_FORCE_ALTER) ?
> !opts.force_alter : !sub->forcealter;
>
> if (raise_error)
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("cannot alter %s when there are prepared transactions",
> "two_phase = off"),
> errhint("Resolve these transactions or set %s, and then try again.",
> "force_alter = true")));
>

Modified.

> ======
> src/bin/pg_dump/pg_dump.c
>
> 4.6. getSubscriptions
>
> + if (fout->remoteVersion >= 170000)
> + appendPQExpBufferStr(query,
> + " s.subforcealter\n");
> + else
> + appendPQExpBuffer(query,
> + " false AS subforcealter\n");
> +
> +
>
> 4.6a.
> Should this just be combined with the existing "if
> (fout->remoteVersion >= 170000)" for failover?

This was intentional. Features for PG17 have already been frozen, so
the patch will be pushed for PG18. After removeVersion is bumped,
I want to replace to "(fout->remoteVersion >= 180000)"

>
> ~
>
> 4.6b.
> Double blank lines.

Fixed.

> src/bin/psql/describe.c
>
> 4.7.
> + if (pset.sversion >= 170000)
> + appendPQExpBuffer(&buf,
> + ", subforcealter AS \"%s\"\n",
> + gettext_noop("Force_alter"));
>
> IMO the column title should be "Force alter" (i.e. without the underscore)
>

Fixed.

> ======
> src/include/catalog/pg_subscription.h
>
> 4.8. CATALOG
>
> + bool subforcealter; /* True if we allow to drop prepared transactions
> + * when altering two_phase "on"->"off". */
>
> I think this is not actually the description of 'force_alter'. What
> you wrote just happens to be the only case that this option currently
> works for. Maybe a more correct description is something like below.
>
> SUGGESTION
> True allows the ALTER SUBSCRIPTION command to proceed under conditions
> that would otherwise result in an error. Currently, 'force_alter' only
> has an effect when altering the two_phase option from "true" to
> "false".
>

Hmm. Seems bit long, but used yours.

> ~~~
>
> 4.9. struct Subscription
>
> + bool forcealter; /* True if we allow to drop prepared
> + * transactions when altering two_phase
> + * "on"->"off". */
>
> Ditto the previous review comment.
>

Ditto.

> ======
> src/test/regress/expected/subscription.out
>
> 4.10.
> -
> List of subscriptions
> - Name | Owner | Enabled | Publication
> | Binary | Streaming | Two-phase commit | Disable on error | Origin |
> Password required | Run as owner? | Failover | Synchronous commit |
> Conninfo | Skip LSN
> -------------------+---------------------------+---------+-------------+--------
> +-----------+------------------+------------------+--------+-------------------
> +---------------+----------+--------------------+-----------------------------+
> ----------
> - regress_testsub4 | regress_subscription_user | f | {testpub}
> | f | off | d | f | none |
> t | f | f | off |
> dbname=regress_doesnotexist | 0/0
> +
> List of
> subscriptions
> + Name | Owner | Enabled | Publication
> | Binary | Streaming | Two-phase commit | Disable on error | Origin |
> Password required | Run as owner? | Failover | Force_alter |
> Synchronous commit | Conninfo | Skip LSN
> +------------------+---------------------------+---------+-------------+-------
> -+-----------+------------------+------------------+--------+------------------
> -+---------------+----------+-------------+--------------------+---------------
> --------------+----------
>
> The column heading should be "Force alter", as already mentioned by an
> earlier review comment (#4.7)

Yeah, fixed.

> src/test/subscription/t/099_twophase_added.pl
>
> 4.11.
>
> +# Alter the two_phase with the force_alter option. Apart from the the last
> +# ALTER SUBSCRIPTION command, the command will abort the prepared
> transaction
> +# and succeed.
>
> There is typo "the the" and the wording is a bit strange. Why not just say:
>
> SUGGESTION
> Alter the two_phase true to false with the force_alter option enabled.
> This command will succeed after aborting the prepared transaction.

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment Content-Type Size
v10-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIP.patch application/octet-stream 25.5 KB
v10-0002-Alter-slot-option-two_phase-only-when-altering-t.patch application/octet-stream 12.2 KB
v10-0003-Abort-prepared-transactions-while-altering-two_p.patch application/octet-stream 9.6 KB
v10-0004-Add-force_alter-option-for-ALTER-SUBSCRIPTION-.-.patch application/octet-stream 57.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2024-05-14 12:03:58 Re: Postgres and --config-file option
Previous Message Alexander Lakhin 2024-05-14 12:00:00 Re: Avoid orphaned objects dependencies, take 3