RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-11-15 11:57:38
Message-ID: OS0PR01MB5716A380FDDE576CCD84F5BE94049@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday, November 12, 2022 7:06 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>
> On Fri, Nov 11, 2022 at 2:12 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
>
> Few comments on v46-0001:
> ======================
>

Thanks for the comments.

> 1.
> +static void
> +apply_handle_stream_abort(StringInfo s)
> {
> ...
> + /* Send STREAM ABORT message to the parallel apply worker. */
> + parallel_apply_send_data(winfo, s->len, s->data);
> +
> + if (abort_toplevel_transaction)
> + {
> + parallel_apply_unlock_stream(xid, AccessExclusiveLock);
>
> Shouldn't we need to release this lock before sending the message as
> we are doing for streap_prepare and stream_commit? If there is a
> reason for doing it differently here then let's add some comments for
> the same.

Changed.

> 2. It seems once the patch makes the file state as busy
> (LEADER_FILESET_BUSY), it will only be accessible after the leader
> apply worker receives a transaction end message like stream_commit. Is
> my understanding correct? If yes, then why can't we make it accessible
> after the stream_stop message? Are you worried about the concurrency
> handling for reading and writing the file? If so, we can probably deal
> with it via some lock for reading and writing to file for each change.
> I think after this we may not need additional stream level lock/unlock
> in parallel_apply_spooled_messages. I understand that you probably
> want to keep the code simple so I am not suggesting changing it
> immediately but just wanted to know whether you have considered
> alternatives here.

I thought about this, but it seems the current buffile design doesn't allow two
processes to open the same buffile at the same time(refer to the comment atop
of BufFileOpenFileSet()). This means the LA needs to make sure the PA has
closed the buffile before writing more changes into it. Although we could let
the LA wait for that, but it could cause another kind of deadlock. Suppose the
PA opened the file and is blocked when applying the just read change. And the
LA starts to wait when trying to write the next set of streaming changes into
file because the file is still opened by PA. Then the lock edge is like:

LA (wait for file to be closed) -> PA1 (wait for unique lock in PA2) -> PA2
(wait for stream lock held in LA)

We could introduce another lock for this, but that seems not very great as we
already had two kinds of locks here.

Another solution could be we create different filename for each streaming block
so that the leader don't need to reopen the same file after writing changes
into it, but that seems largely increase the number of temp files and looks a
bit hacky. Or we could let PA open the file, then read and close the file for
each change, but it seems bring some overhead of opening and closing file.

Another solution which doesn't need a new lock could be that we create
different filename for each streaming block so that the leader doesn't need to
reopen the same file after writing changes into it, but that seems largely
increase the number of temp files and looks a bit hacky. Or we could let PA
open the file, then read and close the file for each change, but it seems bring
some overhead of opening and closing file.

Based on above, how about keep the current approach ?(i.e. PA
will open the file only after the leader apply worker receives a transaction
end message like stream_commit). Ideally, it will enter partial serialize mode
only when PA is blocked by a backend or another PA which seems not that common.

> 3. Don't we need to release the transaction lock at stream_abort in
> parallel apply worker? I understand that we are not waiting for it in
> the leader worker but still parallel apply worker should release it if
> acquired at stream_start by it.

I thought that the lock will be automatically released on rollback. But after testing, I find
It’s possible that the lock won't be released if it's a empty streaming transaction. So, I
add the code to release the lock in the new version patch.

>
> 4. A minor comment change as below:
> diff --git a/src/backend/replication/logical/worker.c
> b/src/backend/replication/logical/worker.c
> index 43f09b7e9a..c771851d1f 100644
> --- a/src/backend/replication/logical/worker.c
> +++ b/src/backend/replication/logical/worker.c
> @@ -1851,6 +1851,9 @@ apply_handle_stream_abort(StringInfo s)
> parallel_apply_stream_abort(&abort_data);
>
> /*
> + * We need to wait after processing rollback
> to savepoint for the next set
> + * of changes.
> + *
> * By the time parallel apply worker is
> processing the changes in
> * the current streaming block, the leader
> apply worker may have
> * sent multiple streaming blocks. So, try to
> lock only if there

Merged.

Attach the new version patch set which addressed above comments and comments from [1].

In the new version patch, I renamed parallel_apply_xxx functions to pa_xxx to
make the name shorter according to the suggestion in [1]. Besides, I split the
codes related to partial serialize to 0002 patch to make the patch easier to
review.

[1] https://www.postgresql.org/message-id/CAA4eK1LGyQ%2BS-jCMnYSz_hvoqiNA0Of%3D%2BMksY%3DXTUaRc5XzXJQ%40mail.gmail.com

Best regards,
Hou zj

Attachment Content-Type Size
v47-0002-Serialize-partial-changes-to-disk-if-the-shm_mq-.patch application/octet-stream 36.5 KB
v47-0003-Test-streaming-parallel-option-in-tap-test.patch application/octet-stream 80.1 KB
v47-0001-Perform-streaming-logical-transactions-by-parall.patch application/octet-stream 182.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2022-11-15 12:41:15 Re: Error on missing Python module in Meson setup
Previous Message Amit Kapila 2022-11-15 11:51:44 Re: Assertion failure in SnapBuildInitialSnapshot()