RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-08-11 07:49:18
Message-ID: OS0PR01MB57163111096F9397A51F2FE394649@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, August 9, 2022 4:49 PM Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> Dear Wang,
>
> Thanks for updating patch sets! Followings are comments about v20-0001.
>
> 1. config.sgml
>
> ```
> <para>
> Specifies maximum number of logical replication workers. This includes
> both apply workers and table synchronization workers.
> </para>
> ```
>
> I think you can add a description in the above paragraph, like
> " This includes apply main workers, apply background workers, and table
> synchronization workers."

Changed.

> 2. logical-replication.sgml
>
> 2.a Configuration Settings
>
> ```
> <varname>max_logical_replication_workers</varname> must be set to at
> least
> the number of subscriptions, again plus some reserve for the table
> synchronization.
> ```
>
> I think you can add a description in the above paragraph, like
> "... the number of subscriptions, plus some reserve for the table
> synchronization
> and the streaming transaction."

I think it's not a must to add the number of streaming transactions here as
it also works even if no worker is available for apply bgworker as explained in
the document of streaming option.

> 2.b Monitoring
>
> ```
> <para>
> Normally, there is a single apply process running for an enabled
> subscription. A disabled subscription or a crashed subscription will have
> zero rows in this view. If the initial data synchronization of any
> table is in progress, there will be additional workers for the tables
> being synchronized.
> </para>
> ```
>
> I think you can add a sentence in the above paragraph, like
> "... synchronized. Moreover, if the streaming transaction is applied parallelly,
> there will be additional workers"

Added

> 3. launcher.c
>
> ```
> + /* Sanity check : we don't support table sync in subworker. */
> ```
>
> I think "Sanity check :" should be "Sanity check:", per other files.

Changed.

> 4. worker.c
>
> 4.a handle_streamed_transaction()
>
> ```
> - /* not in streaming mode */
> - if (!in_streamed_transaction)
> + /* Not in streaming mode */
> + if (!(in_streamed_transaction || am_apply_bgworker()))
> ```
>
> I think the comment should also mention about apply background worker case.

Added.

> 4.b handle_streamed_transaction()
>
> ```
> - Assert(stream_fd != NULL);
> ```
>
> I think this assersion seems reasonable in case of stream='on'.
> Could you revive it and move to later part in the function, like after
> subxact_info_add(current_xid)?

Added.

> 4.c apply_handle_prepare_internal()
>
> ```
> * BeginTransactionBlock is necessary to balance the
> EndTransactionBlock
> * called within the PrepareTransactionBlock below.
> */
> - BeginTransactionBlock();
> + if (!IsTransactionBlock())
> + BeginTransactionBlock();
> +
> ```
>
> I think the comment should be "We must be in transaction block to balance...".

Changed.

> 4.d apply_handle_stream_prepare()
>
> ```
> - *
> - * Logic is in two parts:
> - * 1. Replay all the spooled operations
> - * 2. Mark the transaction as prepared
> */
> static void
> apply_handle_stream_prepare(StringInfo s)
> ```
>
> I think these comments are useful when stream='on',
> so it should be moved to later part.

I think we already have similar comments in later part.

> 5. applybgworker.c
>
> 5.a apply_bgworker_setup()
>
> ```
> + elog(DEBUG1, "setting up apply worker #%u",
> list_length(ApplyBgworkersList) + 1);
> ```
>
> "apply worker" should be "apply background worker".
>
> 5.b LogicalApplyBgwLoop()
>
> ```
> + elog(DEBUG1, "[Apply BGW #%u] ended
> processing streaming chunk,"
> + "waiting on shm_mq_receive",
> shared->worker_id);
> ```
>
> A blank is needed after comma. I checked serverlog, and the message
> outputed like:
>
> ```
> [Apply BGW #1] ended processing streaming chunk,waiting on
> shm_mq_receive
> ```

Changed.

> 6.
>
> When I started up the apply background worker and did `SELECT * from
> pg_stat_subscription`, I got following lines:
>
> ```
> postgres=# select * from pg_stat_subscription;
> subid | subname | pid | relid | received_lsn | last_msg_send_time
> | last_msg_receipt_time | latest_end_lsn | latest_end
> _time
> -------+---------+-------+-------+--------------+----------------------------
> ---+-------------------------------+----------------+------------------
> -------------
> 16400 | sub | 22383 | | | -infinity |
> -infinity | | -infinity
> 16400 | sub | 22312 | | 0/6734740 | 2022-08-09
> 07:40:19.367676+00 | 2022-08-09 07:40:19.375455+00 | 0/6734740 |
> 2022-08-09 07:40:
> 19.367676+00
> (2 rows)
> ```
>
>
> 6.a
>
> It seems that the upper line represents the apply background worker, but I
> think last_msg_send_time and last_msg_receipt_time should be null.
> Is it like initialization mistake?

Changed.

> ```
> $ ps aux | grep 22383
> ... postgres: logical replication apply background worker for subscription
> 16400
> ```
>
> 6.b
>
> Currently, the documentation doesn't clarify the method to determine the type
> of logical replication workers.
> Could you add descriptions about it?
> I think adding a column "subworker" is an alternative approach.

I am quite sure about whether it's necessary,
But I tried to add a new column(main_apply_pid) in a separate patch(0005).

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-08-11 08:04:40 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message houzj.fnst@fujitsu.com 2022-08-11 07:48:36 RE: Perform streaming logical transactions by background workers and parallel apply