RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-12-23 05:52:00
Message-ID: OS0PR01MB57169EF567017861EACBD10D94E99@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, December 22, 2022 8:05 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Dec 21, 2022 at 11:02 AM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Attach the new patch set which also includes some cosmetic comment
> > changes.
> >
>
> Few minor comments:
> =================
> 1.
> + <literal>t</literal> = spill the changes of in-progress
> transactions to+ disk and apply at once after the transaction is
> committed on the+ publisher,
>
> Can we change this description to: "spill the changes of in-progress transactions
> to disk and apply at once after the transaction is committed on the publisher and
> received by the subscriber,"

Changed.

> 2.
> table is in progress, there will be additional workers for the tables
> - being synchronized.
> + being synchronized. Moreover, if the streaming transaction is applied in
> + parallel, there will be additional workers.
>
> Do we need this change in the first patch? We skip parallel apply workers from
> view for the first patch. Am, I missing something?

No, I moved this to 0007 which include parallel apply workers in the view.

> 3.
> I think we would need a catversion bump for parallel apply feature because of
> below change:
> @@ -7913,11 +7913,16 @@ SCRAM-SHA-256$<replaceable>&lt;iteration
> count&gt;</replaceable>:<replaceable>&l
>
> <row>
> <entry role="catalog_table_entry"><para role="column_definition">
> - <structfield>substream</structfield> <type>bool</type>
> + <structfield>substream</structfield> <type>char</type>
> </para>
>
> Am, I missing something? If not, then I think we can note that in the commit
> message to avoid forgetting it before commit.

Added.

>
> 4. Kindly change the below comments:
> diff --git a/src/backend/replication/logical/applyparallelworker.c
> b/src/backend/replication/logical/applyparallelworker.c
> index 97f4a3037c..02bb608188 100644
> --- a/src/backend/replication/logical/applyparallelworker.c
> +++ b/src/backend/replication/logical/applyparallelworker.c
> @@ -9,11 +9,10 @@
> *
> * This file contains the code to launch, set up, and teardown a parallel apply
> * worker which receives the changes from the leader worker and invokes
> routines
> - * to apply those on the subscriber database.
> - *
> - * This file contains routines that are intended to support setting up, using
> - * and tearing down a ParallelApplyWorkerInfo which is required so the leader
> - * worker and parallel apply workers can communicate with each other.
> + * to apply those on the subscriber database. Additionally, this file
> + contains
> + * routines that are intended to support setting up, using, and tearing
> + down a
> + * ParallelApplyWorkerInfo which is required so the leader worker and
> + parallel
> + * apply workers can communicate with each other.
> *
> * The parallel apply workers are assigned (if available) as soon as xact's
> * first stream is received for subscriptions that have set their 'streaming'

Merged.

Besides, I also did the following changes:
1. Added maybe_reread_subscription_info in leader before assigning the
transaction to parallel apply worker (Sawada-san's comments[1])
2. Removed the "out of parallel apply workers" LOG ( Sawada-san's comments[1])
3. Improved a elog message (Sawada-san's comments[1]).
4. Moved the testcases from 032_xx into existing 015_stream.pl which can save
the initialization time. Since we introduced quite a few testcases in this
patch set, so I did this to try to reduce the testing time that increased after
applying these patches.

[1] https://www.postgresql.org/message-id/CAD21AoDWd2pXau%2BpkYWOi87VGYrDD%3DOxakEDgOyUS%2BqV9XuAGA%40mail.gmail.com

Best regards,
Hou zj

Attachment Content-Type Size
v65-0007-Add-a-main_worker_pid-to-pg_stat_subscription.patch application/octet-stream 9.5 KB
v65-0001-Perform-streaming-logical-transactions-by-parall.patch application/octet-stream 214.7 KB
v65-0002-Test-streaming-parallel-option-in-tap-test.patch application/octet-stream 77.6 KB
v65-0003-Allow-streaming-every-change-without-waiting-til.patch application/octet-stream 9.4 KB
v65-0004-Add-GUC-stream_serialize_threshold-and-test-seri.patch application/octet-stream 12.2 KB
v65-0005-Stop-extra-worker-if-GUC-was-changed.patch application/octet-stream 4.6 KB
v65-0006-Retry-to-apply-streaming-xact-only-in-apply-work.patch application/octet-stream 21.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-12-23 05:56:21 Re: Exit walsender before confirming remote flush in logical replication
Previous Message Amit Kapila 2022-12-23 05:50:22 Re: Force streaming every change in logical decoding