Re: Perform streaming logical transactions by background workers and parallel apply

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-12-19 12:47:25
Message-ID: CAA4eK1KdVRffYEYiv9=4tVWhP=Jkfh1h6FM0A1R0NEe8cLT8+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 17, 2022 at 7:34 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Agreed. I have addressed all the comments and did some cosmetic changes.
> Attach the new version patch set.
>

Few comments:
============
1.
+ if (fileset_state == FS_SERIALIZE_IN_PROGRESS)
+ {
+ pa_lock_stream(MyParallelShared->xid, AccessShareLock);
+ pa_unlock_stream(MyParallelShared->xid, AccessShareLock);
+ }
+
+ /*
+ * We cannot read the file immediately after the leader has serialized all
+ * changes to the file because there may still be messages in the memory
+ * queue. We will apply all spooled messages the next time we call this
+ * function, which should ensure that there are no messages left in the
+ * memory queue.
+ */
+ else if (fileset_state == FS_SERIALIZE_DONE)
+ {

Once we have waited in the FS_SERIALIZE_IN_PROGRESS, the file state
can be FS_SERIALIZE_DONE immediately after that. So, won't it be
better to have a separate if block for FS_SERIALIZE_DONE state? If you
agree to do so then we can probably remove the comment: "* XXX It is
possible that immediately after we have waited for a lock in ...".

2.
+void
+pa_decr_and_wait_stream_block(void)
+{
+ Assert(am_parallel_apply_worker());
+
+ if (pg_atomic_sub_fetch_u32(&MyParallelShared->pending_stream_count, 1) == 0)

I think here the count can go negative when we are in serialize mode
because we don't increase it for serialize mode. I can't see any
problem due to that but OTOH, this doesn't seem to be intended because
in the future if we decide to implement the functionality of switching
back to non-serialize mode, this could be a problem. Also, I guess we
don't even need to try locking/unlocking the stream lock in that case.
One idea to avoid this is to check if the pending count is zero then
if file_set in not available raise an error (elog ERROR), otherwise,
simply return from here.

3. In apply_handle_stream_stop(), we are setting backendstate as idle
for cases TRANS_LEADER_SEND_TO_PARALLEL and TRANS_PARALLEL_APPLY. For
other cases, it is set by stream_stop_internal. I think it would be
better to set the state explicitly for all cases to make the code look
consistent and remove it from stream_stop_internal(). The other reason
to remove setting the state from stream_stop_internal() is that when
that function is invoked from other places like
apply_handle_stream_commit(), it seems to be setting the idle before
actually we reach the idle state.

4. Apart from the above, I have made a few changes in the comments,
see attached.

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
changes_amit_1.patch application/octet-stream 4.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2022-12-19 13:19:14 Re: Support logical replication of DDLs
Previous Message Ted Yu 2022-12-19 12:44:15 Re: On login trigger: take three