Re: Perform streaming logical transactions by background workers and parallel apply

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-11-04 11:44:33
Message-ID: CAA4eK1L5bmFT96XX2bevh9jwJuN3edv7i_-NP+TVt-87yP6LNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 4, 2022 at 1:36 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Nov 3, 2022 at 6:36 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Thanks for the analysis and summary !
> >
> > I tried to implement the above idea and here is the patch set.
> >
>
> Few comments on v42-0001
> ===========================
>

Few more comments on v42-0001
===============================
1. In parallel_apply_send_data(), it seems winfo->serialize_changes
and switching_to_serialize are set to indicate that we have changed
parallel to serialize mode. Isn't using just the
switching_to_serialize sufficient? Also, it would be better to name
switching_to_serialize as parallel_to_serialize or something like
that.

2. In parallel_apply_send_data(), the patch has already initialized
the fileset, and then again in apply_handle_stream_start(), it will do
the same if we fail while sending stream_start message to the parallel
worker. It seems we don't need to initialize fileset again for
TRANS_LEADER_PARTIAL_SERIALIZE state in apply_handle_stream_start()
unless I am missing something.

3.
apply_handle_stream_start(StringInfo s)
{
...
+ if (!first_segment)
+ {
+ /*
+ * Unlock the shared object lock so that parallel apply worker
+ * can continue to receive and apply changes.
+ */
+ parallel_apply_unlock(winfo->shared->stream_lock_id);
...
}

Can we have an assert before this unlock call that the lock must be
held? Similarly, if there are other places then we can have assert
there as well.

4. It is not very clear to me how maintaining ParallelApplyLockids
list is helpful.

5.
/*
+ * Handle STREAM START message when the transaction was spilled to disk.
+ *
+ * Inintialize fileset if not yet and open the file.
+ */
+void
+serialize_stream_start(TransactionId xid, bool first_segment)
+{
+ /*
+ * Start a transaction on stream start,

This function's name and comments seem to indicate that it is to
handle stream_start message. Is that really the case? It is being
called from parallel_apply_send_data() which made me think it can be
used from other places as well.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2022-11-04 11:48:08 Re: Allow single table VACUUM in transaction block
Previous Message Ashutosh Sharma 2022-11-04 11:41:33 Re: confirmed_flush_lsn shows LSN of the data that has not yet been received by the logical subscriber.