RE: Perform streaming logical transactions by background workers and parallel apply

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-06-02 10:01:37
Message-ID: OS3PR01MB62758A881FF3240171B7B21B9EDE9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 1, 2022 1:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Jun 1, 2022 at 7:30 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> >
> > On Tue, May 31, 2022 at 5:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > On Mon, May 30, 2022 at 5:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > > >
> > > > On Mon, May 30, 2022 at 2:22 PM wangw(dot)fnst(at)fujitsu(dot)com
> > > > <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > >
> > > > > Attach the new patches(only changed 0001 and 0002)
> > > > >
> > > >
> > >
> > > This patch allows the same replication origin to be used by the main
> > > apply worker and the bgworker that uses it to apply streaming
> > > transactions. See the changes [1] in the patch. I am not completely
> > > sure whether that is a good idea even though I could not spot or think
> > > of problems that can't be fixed in your patch. I see that currently
> > > both the main apply worker and bgworker will assign MyProcPid to the
> > > assigned origin slot, this can create the problem because
> > > ReplicationOriginExitCleanup() can clean it up even though the main
> > > apply worker or another bgworker is still using that origin slot.
> >
> > Good point.
> >
> > > Now,
> > > one way to fix is that we assign only the main apply worker's
> > > MyProcPid to session_replication_state->acquired_by. I have tried to
> > > think about the concurrency issues as multiple workers could now point
> > > to the same replication origin state. I think it is safe because the
> > > patch maintains the commit order by allowing only one process to
> > > commit at a time, so no two workers will be operating on the same
> > > origin at the same time. Even, though there is no case where the patch
> > > will try to advance the session's origin concurrently, it appears safe
> > > to do so as we change/advance the session_origin LSNs under
> > > replicate_state LWLock.
> >
> > Right. That way, the cleanup is done only by the main apply worker.
> > Probably the bgworker can check if the origin is already acquired by
> > its (leader) main apply worker process for safety.
> >
>
> Yeah, that makes sense.
>
> > >
> > > Another idea could be that we allow multiple replication origins (one
> > > for each bgworker and one for the main apply worker) for the apply
> > > workers corresponding to a subscription. Then on restart, we can find
> > > the highest LSN among all the origins for a subscription. This should
> > > work primarily because we will maintain the commit order. Now, for
> > > this to work we need to somehow map all the origins for a subscription
> > > and one possibility is that we have a subscription id in each of the
> > > origin names. Currently we use ("pg_%u", MySubscription->oid) as
> > > origin_name. We can probably append some unique identifier number for
> > > each worker to allow each origin to have a subscription id. We need to
> > > drop all origins for a particular subscription on DROP SUBSCRIPTION. I
> > > think having multiple origins for the same subscription will have some
> > > additional work when we try to filter changes based on origin.
> >
> > It also seems to work but need additional work and resource.
> >
> > > The advantage of the first idea is that it won't increase the need to
> > > have more origins per subscription but it is quite possible that I am
> > > missing something and there are problems due to which we can't use
> > > that approach.
> >
> > I prefer the first idea as it's simpler than the second one. I don't
> > see any concurrency problem so far unless I'm not missing something.
> >
>
> Thanks for evaluating it and sharing your opinion.
Thanks for your comments and opinions.

I fixed this problem by following the first suggestion. I also added the
relevant checks and changed the relevant comments.

Thanks for Shi Yu to add some tests as suggested by Osumi-san in [1].#4 and
improve the 0002 patch by adding some checks to see if the apply background
worker starts.

Attach the new patches.
1. Add some descriptions related to "apply" mode to logical-replication.sgml
and create_subscription.sgml.(suggested by Osumi-san in [1].#1,#4)
2. Fix the problem that values in pg_stat_subscription_stats are not updated
properly. (suggested by Osumi-san in [1].#2)
3. Improve the code formatting of the patches. (suggested by Osumi-san in [1].#3)
4. Add some tests in 0003 patch. And improve some tests by adding some checks
to see if the apply background worker starts in 0002 patch. (suggested by
Osumi-san in [1].#4 and Shi Yu)
5. Improve the log message. (suggested by Amit-san in [2].#1)
6. Separate the new logic related to apply background worker to new file
applybgwroker.c. (suggested by Amit-san in [2].#2)
7. Improve function handle_streamed_transaction. (suggested by Amit-san in[2].#3)
8. Improve some comments. (suggested by Amit-san in [2].#4,#6 and me)
9. Fix the problem that the structure member "acquired_by" is incorrectly set
when apply background worker tries to get replication origin.
(suggested by Amit-san in [3])

[1] - https://www.postgresql.org/message-id/TYCPR01MB83735AEE38370254ED495B06EDDA9%40TYCPR01MB8373.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/CAA4eK1Jt08SYbRt_-rbSWNg%3DX9-m8%2BRdP5PosfnQgyF-z8bkxQ%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CAA4eK1%2BZ6ahpTQK2KzkvQ1kN-urVS9-N_RDM11MS%2BbtqaB8Bpw%40mail.gmail.com

Regards,
Wang wei

Attachment Content-Type Size
v8-0001-Perform-streaming-logical-transactions-by-backgro.patch application/octet-stream 81.4 KB
v8-0002-Test-streaming-apply-option-in-tap-test.patch application/octet-stream 68.9 KB
v8-0003-Add-some-checks-before-using-apply-background-wor.patch application/octet-stream 26.9 KB
v8-0004-Add-a-GUC-max_apply_bgworkers_per_subscription-to.patch application/octet-stream 6.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wangw.fnst@fujitsu.com 2022-06-02 10:03:32 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message David Geier 2022-06-02 09:30:31 Re: Assertion failure with barriers in parallel hash join