RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-11-08 03:56:43
Message-ID: OS0PR01MB5716F3493A051C5BB23CAD3C943F9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, November 7, 2022 9:19 PM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Friday, November 4, 2022 7:45 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Fri, Nov 4, 2022 at 1:36 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > On Thu, Nov 3, 2022 at 6:36 PM houzj(dot)fnst(at)fujitsu(dot)com
> > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > Thanks for the analysis and summary !
> > > >
> > > > I tried to implement the above idea and here is the patch set.
> > > >
> > >
> > > Few comments on v42-0001
> > > ===========================
> > >
>
> Thanks for the comments.
>
> > Few more comments on v42-0001
> > ===============================
> > 1. In parallel_apply_send_data(), it seems winfo->serialize_changes
> > and switching_to_serialize are set to indicate that we have changed
> > parallel to serialize mode. Isn't using just the
> > switching_to_serialize sufficient? Also, it would be better to name
> > switching_to_serialize as parallel_to_serialize or something like
> > that.
>
> I slightly change the logic to let serialize the message directly when timeout
> instead of invoking apply_dispatch again so that we don't need the
> switching_to_serialize.
>
> >
> > 2. In parallel_apply_send_data(), the patch has already initialized
> > the fileset, and then again in apply_handle_stream_start(), it will do
> > the same if we fail while sending stream_start message to the parallel
> > worker. It seems we don't need to initialize fileset again for
> > TRANS_LEADER_PARTIAL_SERIALIZE state in apply_handle_stream_start()
> > unless I am missing something.
>
> Fixed.
>
> > 3.
> > apply_handle_stream_start(StringInfo s) { ...
> > + if (!first_segment)
> > + {
> > + /*
> > + * Unlock the shared object lock so that parallel apply worker
> > + * can continue to receive and apply changes.
> > + */
> > + parallel_apply_unlock(winfo->shared->stream_lock_id);
> > ...
> > }
> >
> > Can we have an assert before this unlock call that the lock must be
> > held? Similarly, if there are other places then we can have assert
> > there as well.
>
> It seems we don't have a standard API can be used without a transaction.
> Maybe we can use the list ParallelApplyLockids to check that ?
>
> > 4. It is not very clear to me how maintaining ParallelApplyLockids
> > list is helpful.
>
> I will think about this and remove this in next version list if possible.
>
> >
> > 5.
> > /*
> > + * Handle STREAM START message when the transaction was spilled to disk.
> > + *
> > + * Inintialize fileset if not yet and open the file.
> > + */
> > +void
> > +serialize_stream_start(TransactionId xid, bool first_segment) {
> > + /*
> > + * Start a transaction on stream start,
> >
> > This function's name and comments seem to indicate that it is to
> > handle stream_start message. Is that really the case? It is being
> > called from parallel_apply_send_data() which made me think it can be
> > used from other places as well.
>
> Adjusted the comment.
>
> Here is the new version patch set which addressed comments as of last Friday.
> I also added some comments for the newly introduced codes in this version.
>

Sorry, I posted the wrong patch for V43 which lack some changes.
Attach the correct patch set here.

Best regards,
Hou zj

Attachment Content-Type Size
v44-0002-Test-streaming-parallel-option-in-tap-test.patch application/octet-stream 79.4 KB
v44-0001-Perform-streaming-logical-transactions-by-parall.patch application/octet-stream 168.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-11-08 04:01:59 Re: psql: Add command to use extended query protocol
Previous Message David Rowley 2022-11-08 03:54:25 Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment