RE: Perform streaming logical transactions by background workers and parallel apply

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-06-17 07:17:10
Message-ID: OS3PR01MB62753CDFD0DF1FCBC978AFB39EAF9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 15, 2022 at 8:13 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Few questions/comments on 0001
> ===========================
Thanks for your comments.

> 1.
> In the commit message, I see: "We also need to allow stream_stop to
> complete by the apply background worker to avoid deadlocks because
> T-1's current stream of changes can update rows in conflicting order
> with T-2's next stream of changes."
>
> Thinking about this, won't the T-1 and T-2 deadlock on the publisher
> node as well if the above statement is true?
Yes, I think so.
I think if table's unique index/constraint of the publisher and the subscriber
are consistent, the deadlock will occur on the publisher-side.
If it is inconsistent, deadlock may only occur in the subscriber. But since we
added the check for these (see patch 0004), so it seems okay to not handle this
at STREAM_STOP.

BTW, I made the following improvements to the code (#a, #c are improved in 0004
patch, #b, #d and #e are improved in 0001 patch.) :
a.
I added some comments in the function apply_handle_stream_stop to explain why
we do not need to allow stream_stop to complete by the apply background worker.
b.
I deleted related commit message in 0001 patch and the related comments in file
header (worker.c).
c.
Renamed the function logicalrep_rel_mark_apply_bgworker to
logicalrep_rel_mark_safe_in_apply_bgworker. Also did some slight improvements
in this function.
d.
When apply worker sends stream xact messages to apply background worker, only
wait for apply background worker to complete when commit, prepare and abort of
toplevel xact.
e.
The state setting of apply background worker was not very accurate before, so
improved this (see the invocations to function pgstat_report_activity in
function LogicalApplyBgwLoop, apply_handle_stream_start and
apply_handle_stream_abort).

> 2.
> + <para>
> + The apply background workers are taken from the pool defined by
> + <varname>max_logical_replication_workers</varname>.
> + </para>
> + <para>
> + The default value is 3. This parameter can only be set in the
> + <filename>postgresql.conf</filename> file or on the server command
> + line.
> + </para>
>
> Is there a reason to choose this number as 3? Why not 2 similar to
> max_sync_workers_per_subscription?
Improved the default as suggested.

> 3.
> +
> + <para>
> + Setting streaming mode to <literal>apply</literal> could export invalid LSN
> + as finish LSN of failed transaction. Changing the streaming mode and making
> + the same conflict writes the finish LSN of the failed transaction in the
> + server log if required.
> + </para>
>
> How will the user identify that this is an invalid LSN value and she
> shouldn't use it to SKIP the transaction? Can we change the second
> sentence to: "User should change the streaming mode to 'on' if they
> would instead wish to see the finish LSN on error. Users can use
> finish LSN to SKIP applying the transaction." I think we can give
> reference to docs where the SKIP feature is explained.
Improved the sentence as suggested.
And I added the reference after the statement in your suggestion.
It looks like:
```
... Users can use finish LSN to SKIP applying the transaction by running <link
linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ...
SKIP</command></link>.
```

> 4.
> + * This file contains routines that are intended to support setting up, using,
> + * and tearing down a ApplyBgworkerState.
> + * Refer to the comments in file header of logical/worker.c to see more
> + * informations about apply background worker.
>
> Typo. /informations/information.
>
> Consider having an empty line between the above two lines.
Improved the message as suggested.

> 5.
> +ApplyBgworkerState *
> +apply_bgworker_find_or_start(TransactionId xid, bool start)
> {
> ...
> ...
> + if (!TransactionIdIsValid(xid))
> + return NULL;
> +
> + /*
> + * We don't start new background worker if we are not in streaming apply
> + * mode.
> + */
> + if (MySubscription->stream != SUBSTREAM_APPLY)
> + return NULL;
> +
> + /*
> + * We don't start new background worker if user has set skiplsn as it's
> + * possible that user want to skip the streaming transaction. For
> + * streaming transaction, we need to spill the transaction to disk so that
> + * we can get the last LSN of the transaction to judge whether to skip
> + * before starting to apply the change.
> + */
> + if (start && !XLogRecPtrIsInvalid(MySubscription->skiplsn))
> + return NULL;
> +
> + /*
> + * For streaming transactions that are being applied in apply background
> + * worker, we cannot decide whether to apply the change for a relation
> + * that is not in the READY state (see should_apply_changes_for_rel) as we
> + * won't know remote_final_lsn by that time. So, we don't start new apply
> + * background worker in this case.
> + */
> + if (start && !AllTablesyncsReady())
> + return NULL;
> ...
> ...
> }
>
> Can we move some of these starting checks to a separate function like
> canstartapplybgworker()?
Improved as suggested.

BTW, I rebased the temporary patch 0003 because one patch in thread [1] is
committed (see commit b7658c24c7 in HEAD).

Attach the new patches.
Only changed patches 0001, 0004.

Regards,
Wang wei

Attachment Content-Type Size
v12-0001-Perform-streaming-logical-transactions-by-backgr.patch application/octet-stream 98.1 KB
v12-0002-Test-streaming-apply-option-in-tap-test.patch application/octet-stream 68.9 KB
v12-0003-A-temporary-patch-that-includes-patch-in-another.patch application/octet-stream 8.8 KB
v12-0004-Add-some-checks-before-using-apply-background-wo.patch application/octet-stream 35.8 KB
v12-0005-Retry-to-apply-streaming-xact-only-in-apply-work.patch application/octet-stream 25.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-06-17 07:50:38 Re: "buffer too small" or "path too long"?
Previous Message Kyotaro Horiguchi 2022-06-17 07:05:56 Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size