RE: Perform streaming logical transactions by background workers and parallel apply

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-08-04 06:37:42
Message-ID: OS3PR01MB6275595120E69924BAE8BE219E9F9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thurs, Jul 28, 2022 at 13:20 PM Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Wang-san,
>
> Hi, I'm also interested in the patch and I started to review this.
> Followings are comments about 0001.

Thanks for your kindly review and comments.
To avoid making this thread too long, I will reply to all of your comments
(#1~#13) in this email.

> 1. terminology
>
> In your patch a new worker "apply background worker" has been introduced,
> but I thought it might be confused because PostgreSQL has already the worker
> "background worker".
> Both of apply worker and apply bworker are categolized as bgworker.
> Do you have any reasons not to use "apply parallel worker" or "apply streaming
> worker"?
> (Note that I'm not native English speaker)

Since we will later consider applying non-streamed transactions in parallel, I
think "apply streaming worker" might not be very suitable. I think PostgreSQL
also has the worker "parallel worker", so for "apply parallel worker" and
"apply background worker", I feel that "apply background worker" will make the
relationship between workers more clear. ("[main] apply worker" and "apply
background worker")

> 2. logicalrep_worker_stop()
>
> ```
> - /* No worker, nothing to do. */
> - if (!worker)
> - {
> - LWLockRelease(LogicalRepWorkerLock);
> - return;
> - }
> + if (worker)
> + logicalrep_worker_stop_internal(worker);
> +
> + LWLockRelease(LogicalRepWorkerLock);
> +}
> ```
>
> I thought you could add a comment the meaning of if-statement, like "No main
> apply worker, nothing to do"

Since the processing in the if statement is reversed from before, I added the
following comment based on your suggestion:
```
Found the main worker, then try to stop it.
```

> 3. logicalrep_workers_find()
>
> I thought you could add a description about difference between this and
> logicalrep_worker_find() at the top of the function.
> IIUC logicalrep_workers_find() counts subworker, but logicalrep_worker_find()
> does not focus such type of workers.

I think it is fine to keep the comment because the comment says "returns list
of *all workers* for the subscription".
Also, we have added the comment "We are only interested in the main apply
worker or table sync worker here" in the function logicalrep_worker_find.

> 5. applybgworker.c
>
> ```
> +/* Apply background workers hash table (initialized on first use) */
> +static HTAB *ApplyWorkersHash = NULL;
> +static List *ApplyWorkersFreeList = NIL;
> +static List *ApplyWorkersList = NIL;
> ```
>
> I thought they should be ApplyBgWorkersXXX, because they stores information
> only related with apply bgworkers.

I improved them to ApplyBgworkersXXX just for the consistency with other names.

> 6. ApplyBgworkerShared
>
> ```
> + TransactionId stream_xid;
> + uint32 n; /* id of apply background worker */
> +} ApplyBgworkerShared;
> ```
>
> I thought the field "n" is too general, how about "proc_id" or "worker_id"?

I think "worker_id" seems better, so I improved "n" to "worker_id".

> 10. wait_event.h
>
> ```
> WAIT_EVENT_HASH_GROW_BUCKETS_REINSERT,
> + WAIT_EVENT_LOGICAL_APPLY_WORKER_STATE_CHANGE,
> WAIT_EVENT_LOGICAL_SYNC_DATA,
> ```
>
> I thought the event should be
> WAIT_EVENT_LOGICAL_APPLY_BG_WORKER_STATE_CHANGE,
> because this is used when apply worker waits until the status of bgworker
> changes.

I improved them to "WAIT_EVENT_LOGICAL_APPLY_BGWORKER_STATE_CHANGE" just for
the consistency with other names.

> 13. 015_stream.pl
>
> I could not find test about TRUNCATE. IIUC apply bgworker works well
> even if it gets LOGICAL_REP_MSG_TRUNCATE message from main worker.
> Can you add the case?

I modified the test cases in "032_streaming_apply.pl" this time, the use case
you mentioned is covered now.

The rest of the comments are improved as suggested.
The new patches were attached in [1].

[1] - https://www.postgresql.org/message-id/OS3PR01MB6275D64BE7726B0221B15F389E9F9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wangw.fnst@fujitsu.com 2022-08-04 06:38:23 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message wangw.fnst@fujitsu.com 2022-08-04 06:35:45 RE: Perform streaming logical transactions by background workers and parallel apply