RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-10-21 09:31:42
Message-ID: OS0PR01MB571673E3EE2A65A3B8B7713B942D9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, October 19, 2022 8:50 PM Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> ===
> 01. applyparallelworker.c - SIZE_STATS_MESSAGE
>
> ```
> /*
> * There are three fields in each message received by the parallel apply
> * worker: start_lsn, end_lsn and send_time. Because we have updated these
> * statistics in the leader apply worker, we can ignore these fields in the
> * parallel apply worker (see function LogicalRepApplyLoop).
> */
> #define SIZE_STATS_MESSAGE (2 * sizeof(XLogRecPtr) + sizeof(TimestampTz))
> ```
>
> According to other comment styles, it seems that the first sentence of the
> comment should represent the datatype and usage, not the detailed reason.
> For example, about ParallelApplyWorkersList, you said "A list ...". How about
> adding like following message:
> The message size that can be skipped by parallel apply worker

Thanks for the comments, but the current description seems enough to me.

> ~~~
> 02. applyparallelworker.c - parallel_apply_start_subtrans
>
> ```
> if (current_xid != top_xid &&
> !list_member_xid(subxactlist, current_xid)) ```
>
> A macro TransactionIdEquals is defined in access/transam.h. Should we use it,
> or is it too trivial?

I checked the existing codes, it seems both style are being used.
Maybe we can post a separate patch to replace them later.

> ~~~
> 08. worker.c - apply_handle_prepare_internal
>
> Same as above.
>
>
> ~~~
> 09. worker.c - maybe_reread_subscription
>
> ```
> /*
> * Exit if any parameter that affects the remote connection was
> changed.
> * The launcher will start a new worker.
> */
> if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 ||
> strcmp(newsub->name, MySubscription->name) != 0 ||
> strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
> newsub->binary != MySubscription->binary ||
> newsub->stream != MySubscription->stream ||
> strcmp(newsub->origin, MySubscription->origin) != 0 ||
> newsub->owner != MySubscription->owner ||
> !equal(newsub->publications, MySubscription->publications))
> {
> ereport(LOG,
> (errmsg("logical replication apply worker for
> subscription \"%s\" will restart because of a parameter change",
> MySubscription->name)));
>
> proc_exit(0);
> }
> ```
>
> When the parallel apply worker has been launched and then the subscription
> option has been modified, the same message will appear twice.
> But if the option "streaming" is changed from "parallel" to "on", one of them
> will not restart again.
> Should we modify message?

Thanks, it seems a timing problem, if the leader catch the change first and stop
the parallel workers, the message will only appear once. But I agree we'd
better make the message clear. I changed the message in parallel apply worker.
While on it, I also adjusted some other message to include "parallel apply
worker" if they are in parallel apply worker.

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2022-10-21 09:32:16 parse partition strategy string in gram.y
Previous Message Peter Smith 2022-10-21 09:01:54 Re: Data is copied twice when specifying both child and parent table in publication