Re: Perform streaming logical transactions by background workers and parallel apply

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-06-01 05:19:17
Message-ID: CAA4eK1L=KsaU4fCpM_xbwK08E+NpYMviH23+uD0Be_2pKNtw_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 1, 2022 at 7:30 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, May 31, 2022 at 5:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, May 30, 2022 at 5:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, May 30, 2022 at 2:22 PM wangw(dot)fnst(at)fujitsu(dot)com
> > > <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > Attach the new patches(only changed 0001 and 0002)
> > > >
> > >
> >
> > This patch allows the same replication origin to be used by the main
> > apply worker and the bgworker that uses it to apply streaming
> > transactions. See the changes [1] in the patch. I am not completely
> > sure whether that is a good idea even though I could not spot or think
> > of problems that can't be fixed in your patch. I see that currently
> > both the main apply worker and bgworker will assign MyProcPid to the
> > assigned origin slot, this can create the problem because
> > ReplicationOriginExitCleanup() can clean it up even though the main
> > apply worker or another bgworker is still using that origin slot.
>
> Good point.
>
> > Now,
> > one way to fix is that we assign only the main apply worker's
> > MyProcPid to session_replication_state->acquired_by. I have tried to
> > think about the concurrency issues as multiple workers could now point
> > to the same replication origin state. I think it is safe because the
> > patch maintains the commit order by allowing only one process to
> > commit at a time, so no two workers will be operating on the same
> > origin at the same time. Even, though there is no case where the patch
> > will try to advance the session's origin concurrently, it appears safe
> > to do so as we change/advance the session_origin LSNs under
> > replicate_state LWLock.
>
> Right. That way, the cleanup is done only by the main apply worker.
> Probably the bgworker can check if the origin is already acquired by
> its (leader) main apply worker process for safety.
>

Yeah, that makes sense.

> >
> > Another idea could be that we allow multiple replication origins (one
> > for each bgworker and one for the main apply worker) for the apply
> > workers corresponding to a subscription. Then on restart, we can find
> > the highest LSN among all the origins for a subscription. This should
> > work primarily because we will maintain the commit order. Now, for
> > this to work we need to somehow map all the origins for a subscription
> > and one possibility is that we have a subscription id in each of the
> > origin names. Currently we use ("pg_%u", MySubscription->oid) as
> > origin_name. We can probably append some unique identifier number for
> > each worker to allow each origin to have a subscription id. We need to
> > drop all origins for a particular subscription on DROP SUBSCRIPTION. I
> > think having multiple origins for the same subscription will have some
> > additional work when we try to filter changes based on origin.
>
> It also seems to work but need additional work and resource.
>
> > The advantage of the first idea is that it won't increase the need to
> > have more origins per subscription but it is quite possible that I am
> > missing something and there are problems due to which we can't use
> > that approach.
>
> I prefer the first idea as it's simpler than the second one. I don't
> see any concurrency problem so far unless I'm not missing something.
>

Thanks for evaluating it and sharing your opinion.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-06-01 05:27:00 Re: Unicode Variation Selector and Combining character
Previous Message Amit Kapila 2022-06-01 05:09:13 Re: Ignore heap rewrites for materialized views in logical replication