RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2023-01-13 10:13:31
Message-ID: OS0PR01MB5716C9F2C77BD3E403A713A994C29@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, January 13, 2023 1:43 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Thu, Jan 12, 2023 at 9:34 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Thursday, January 12, 2023 7:08 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jan 12, 2023 at 4:21 PM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> wrote:
> > > >
> > > > On Thu, Jan 12, 2023 at 10:34 AM Amit Kapila
> > > > <amit(dot)kapila16(at)gmail(dot)com>
> > > wrote:
> > > > >
> > > > > On Thu, Jan 12, 2023 at 9:54 AM Peter Smith
> > > > > <smithpb2250(at)gmail(dot)com>
> > > wrote:
> > > > > >
> > > > > >
> > > > > > doc/src/sgml/monitoring.sgml
> > > > > >
> > > > > > 5. pg_stat_subscription
> > > > > >
> > > > > > @@ -3198,11 +3198,22 @@ SELECT pid, wait_event_type,
> > > > > > wait_event FROM pg_stat_activity WHERE wait_event i
> > > > > >
> > > > > > <row>
> > > > > > <entry role="catalog_table_entry"><para
> > > > > > role="column_definition">
> > > > > > + <structfield>apply_leader_pid</structfield>
> > > <type>integer</type>
> > > > > > + </para>
> > > > > > + <para>
> > > > > > + Process ID of the leader apply worker, if this process is a
> apply
> > > > > > + parallel worker. NULL if this process is a leader apply worker
> or a
> > > > > > + synchronization worker.
> > > > > > + </para></entry>
> > > > > > + </row>
> > > > > > +
> > > > > > + <row>
> > > > > > + <entry role="catalog_table_entry"><para
> > > > > > + role="column_definition">
> > > > > > <structfield>relid</structfield> <type>oid</type>
> > > > > > </para>
> > > > > > <para>
> > > > > > OID of the relation that the worker is synchronizing; null for
> the
> > > > > > - main apply worker
> > > > > > + main apply worker and the parallel apply worker
> > > > > > </para></entry>
> > > > > > </row>
> > > > > >
> > > > > > 5a.
> > > > > >
> > > > > > (Same as general comment #1 about terminology)
> > > > > >
> > > > > > "apply_leader_pid" --> "leader_apply_pid"
> > > > > >
> > > > >
> > > > > How about naming this as just leader_pid? I think it could be
> > > > > helpful in the future if we decide to parallelize initial sync
> > > > > (aka parallel
> > > > > copy) because then we could use this for the leader PID of
> > > > > parallel sync workers as well.
> > > > >
> > > > > --
> > > >
> > > > I still prefer leader_apply_pid.
> > > > leader_pid does not tell which 'operation' it belongs to. 'apply'
> > > > gives the clarity that it is apply related process.
> > > >
> > >
> > > But then do you suggest that tomorrow if we allow parallel sync
> > > workers then we have a separate column leader_sync_pid? I think that
> > > doesn't sound like a good idea and moreover one can refer to docs for
> clarification.
> >
> > I agree that leader_pid would be better not only for future parallel
> > copy sync feature, but also it's more consistent with the leader_pid column in
> pg_stat_activity.
> >
> > And here is the version patch which addressed Peter's comments and
> > renamed all the related stuff to leader_pid.
>
> Here are two comments on v79-0003 patch.

Thanks for the comments.

>
> + /* Force to serialize messages if stream_serialize_threshold
> is reached. */
> + if (stream_serialize_threshold != -1 &&
> + (stream_serialize_threshold == 0 ||
> + stream_serialize_threshold < parallel_stream_nchunks))
> + {
> + parallel_stream_nchunks = 0;
> + return false;
> + }
>
> I think it would be better if we show the log message ""logical replication apply
> worker will serialize the remaining changes of remote transaction %u to a file"
> even in stream_serialize_threshold case.

Agreed and changed.

>
> IIUC parallel_stream_nchunks won't be reset if pa_send_data() failed due to the
> timeout.

Changed.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2023-01-13 10:13:39 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message houzj.fnst@fujitsu.com 2023-01-13 10:13:25 RE: Perform streaming logical transactions by background workers and parallel apply