Re: Perform streaming logical transactions by background workers and parallel apply

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-11-28 12:25:59
Message-ID: CAA4eK1JuRPUY2Dx=vUToRksTmW0ptqMVT3K32g0368f_ZCk-zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 27, 2022 at 9:43 AM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Attach the new version patch which addressed all comments so far.
>

Few comments on v52-0001*
========================
1.
pa_free_worker()
{
...
+ /* Free the worker information if the worker exited cleanly. */
+ if (!winfo->error_mq_handle)
+ {
+ pa_free_worker_info(winfo);
+
+ if (winfo->in_use &&
+ !hash_search(ParallelApplyWorkersHash, &xid, HASH_REMOVE, NULL))
+ elog(ERROR, "hash table corrupted");

pa_free_worker_info() pfrees the winfo, so how is it legal to
winfo->in_use in the above check?

Also, why is this check (!winfo->error_mq_handle) required in the
first place in the patch? The worker exits cleanly only when the
leader apply worker sends a SIGINT signal and in that case, we already
detach from the error queue and clean up other worker information.

2.
+HandleParallelApplyMessages(void)
+{
...
...
+ foreach(lc, ParallelApplyWorkersList)
+ {
+ shm_mq_result res;
+ Size nbytes;
+ void *data;
+ ParallelApplyWorkerInfo *winfo = (ParallelApplyWorkerInfo *) lfirst(lc);
+
+ if (!winfo->error_mq_handle)
+ continue;

Similar to the previous comment, it is not clear whether we need this
check. If required, can we add a comment to indicate the case where it
happens to be true?

Note, there is a similar check for winfo->error_mq_handle in
pa_wait_for_xact_state(). Please add some comments if that is
required.

3. Why is there apply_worker_clean_exit() at the end of
ParallelApplyWorkerMain()? Normally either the leader worker stops
parallel apply, or parallel apply gets stopped because of a parameter
change, or exits because of error, and in none of those cases it can
hit this code path unless I am missing something.

Additionally, I think in LogicalParallelApplyLoop, we will never
receive zero-length messages so that is also wrong and should be
converted to elog(ERROR,..).

4. I think in logicalrep_worker_detach(), we should detach from the
shm error queue so that the parallel apply worker won't try to send a
termination message back to the leader worker.

5.
pa_send_data()
{
...
+ if (startTime == 0)
+ startTime = GetCurrentTimestamp();
...

What is the use of getting the current timestamp before waitlatch
logic, if it is not used before that? It seems that is for the time
logic to look correct. We can probably reduce the 10s interval to 9s
for that.

In this function, we need to add some comments to indicate why the
current logic is used, and also probably we can refer to the comments
atop this file.

6. I think it will be better if we keep stream_apply_worker local to
applyparallelworker.c by exposing functions to cache/resetting the
required info.

7. Apart from the above, I have made a few changes in the comments and
some miscellaneous cosmetic changes in the attached. Kindly include
these in the next version unless you see a problem with any change.

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
changes_amit_v52_1.patch application/octet-stream 2.4 KB
changes_amit_v52_1.patch application/octet-stream 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stavros Koureas 2022-11-28 12:52:19 Re: Logical Replication Custom Column Expression
Previous Message Bharath Rupireddy 2022-11-28 10:38:18 Re: Introduce a new view for checkpointer related stats