RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-11-16 08:19:48
Message-ID: OS0PR01MB5716CF7D3BDE465AFAF17DC094079@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, November 15, 2022 7:58 PM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Saturday, November 12, 2022 7:06 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com>
> >
> > On Fri, Nov 11, 2022 at 2:12 PM houzj(dot)fnst(at)fujitsu(dot)com
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> >
> > Few comments on v46-0001:
> > ======================
> >
>
> Thanks for the comments.
>
> > 1.
> > +static void
> > +apply_handle_stream_abort(StringInfo s)
> > {
> > ...
> > + /* Send STREAM ABORT message to the parallel apply worker. */
> > + parallel_apply_send_data(winfo, s->len, s->data);
> > +
> > + if (abort_toplevel_transaction)
> > + {
> > + parallel_apply_unlock_stream(xid, AccessExclusiveLock);
> >
> > Shouldn't we need to release this lock before sending the message as
> > we are doing for streap_prepare and stream_commit? If there is a
> > reason for doing it differently here then let's add some comments for
> > the same.
>
> Changed.
>
> > 2. It seems once the patch makes the file state as busy
> > (LEADER_FILESET_BUSY), it will only be accessible after the leader
> > apply worker receives a transaction end message like stream_commit. Is
> > my understanding correct? If yes, then why can't we make it accessible
> > after the stream_stop message? Are you worried about the concurrency
> > handling for reading and writing the file? If so, we can probably deal
> > with it via some lock for reading and writing to file for each change.
> > I think after this we may not need additional stream level lock/unlock
> > in parallel_apply_spooled_messages. I understand that you probably
> > want to keep the code simple so I am not suggesting changing it
> > immediately but just wanted to know whether you have considered
> > alternatives here.
>
> I thought about this, but it seems the current buffile design doesn't allow two
> processes to open the same buffile at the same time(refer to the comment
> atop of BufFileOpenFileSet()). This means the LA needs to make sure the PA has
> closed the buffile before writing more changes into it. Although we could let
> the LA wait for that, but it could cause another kind of deadlock. Suppose the
> PA opened the file and is blocked when applying the just read change. And the
> LA starts to wait when trying to write the next set of streaming changes into file
> because the file is still opened by PA. Then the lock edge is like:
>
> LA (wait for file to be closed) -> PA1 (wait for unique lock in PA2) -> PA2 (wait
> for stream lock held in LA)
>
> We could introduce another lock for this, but that seems not very great as we
> already had two kinds of locks here.
>
> Another solution could be we create different filename for each streaming
> block so that the leader don't need to reopen the same file after writing
> changes into it, but that seems largely increase the number of temp files and
> looks a bit hacky. Or we could let PA open the file, then read and close the file
> for each change, but it seems bring some overhead of opening and closing file.
>
> Another solution which doesn't need a new lock could be that we create
> different filename for each streaming block so that the leader doesn't need to
> reopen the same file after writing changes into it, but that seems largely
> increase the number of temp files and looks a bit hacky. Or we could let PA
> open the file, then read and close the file for each change, but it seems bring
> some overhead of opening and closing file.
>
> Based on above, how about keep the current approach ?(i.e. PA will open the
> file only after the leader apply worker receives a transaction end message like
> stream_commit). Ideally, it will enter partial serialize mode only when PA is
> blocked by a backend or another PA which seems not that common.
>
> > 3. Don't we need to release the transaction lock at stream_abort in
> > parallel apply worker? I understand that we are not waiting for it in
> > the leader worker but still parallel apply worker should release it if
> > acquired at stream_start by it.
>
> I thought that the lock will be automatically released on rollback. But after
> testing, I find It’s possible that the lock won't be released if it's a empty
> streaming transaction. So, I add the code to release the lock in the new version
> patch.
>
> >
> > 4. A minor comment change as below:
> > diff --git a/src/backend/replication/logical/worker.c
> > b/src/backend/replication/logical/worker.c
> > index 43f09b7e9a..c771851d1f 100644
> > --- a/src/backend/replication/logical/worker.c
> > +++ b/src/backend/replication/logical/worker.c
> > @@ -1851,6 +1851,9 @@ apply_handle_stream_abort(StringInfo s)
> > parallel_apply_stream_abort(&abort_data);
> >
> > /*
> > + * We need to wait after processing rollback
> > to savepoint for the next set
> > + * of changes.
> > + *
> > * By the time parallel apply worker is
> > processing the changes in
> > * the current streaming block, the leader
> > apply worker may have
> > * sent multiple streaming blocks. So, try to
> > lock only if there
>
> Merged.
>
> Attach the new version patch set which addressed above comments and
> comments from [1].
>
> In the new version patch, I renamed parallel_apply_xxx functions to pa_xxx to
> make the name shorter according to the suggestion in [1]. Besides, I split the
> codes related to partial serialize to 0002 patch to make the patch easier to
> review.
>
> [1]
> https://www.postgresql.org/message-id/CAA4eK1LGyQ%2BS-jCMnYSz_hvoq
> iNA0Of%3D%2BMksY%3DXTUaRc5XzXJQ%40mail.gmail.com

I noticed that I didn't add CHECK_FOR_INTERRUPTS while retrying send message.
So, attach the new version which adds that. Also attach the 0004 patch that
restarts logical replication with temporarily disabling the parallel apply if
failed to apply a transaction in parallel apply worker.

Best regards,
Hou zj

Attachment Content-Type Size
v48-0004-Retry-to-apply-streaming-xact-only-in-apply-work.patch application/octet-stream 20.6 KB
v48-0001-Perform-streaming-logical-transactions-by-parall.patch application/octet-stream 182.8 KB
v48-0002-Serialize-partial-changes-to-disk-if-the-shm_mq-.patch application/octet-stream 35.7 KB
v48-0003-Test-streaming-parallel-option-in-tap-test.patch application/octet-stream 80.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-11-16 08:28:27 Re: closing file in adjust_data_dir
Previous Message Amit Langote 2022-11-16 07:47:18 out of memory in crosstab()