Re: Perform streaming logical transactions by background workers and parallel apply

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-06-01 02:00:09
Message-ID: CAD21AoC1ME-xQjdcTeMpVMADKJDjfGVZATdGNuR+Qcimov5TPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 31, 2022 at 5:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, May 30, 2022 at 5:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, May 30, 2022 at 2:22 PM wangw(dot)fnst(at)fujitsu(dot)com
> > <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > Attach the new patches(only changed 0001 and 0002)
> > >
> >
>
> This patch allows the same replication origin to be used by the main
> apply worker and the bgworker that uses it to apply streaming
> transactions. See the changes [1] in the patch. I am not completely
> sure whether that is a good idea even though I could not spot or think
> of problems that can't be fixed in your patch. I see that currently
> both the main apply worker and bgworker will assign MyProcPid to the
> assigned origin slot, this can create the problem because
> ReplicationOriginExitCleanup() can clean it up even though the main
> apply worker or another bgworker is still using that origin slot.

Good point.

> Now,
> one way to fix is that we assign only the main apply worker's
> MyProcPid to session_replication_state->acquired_by. I have tried to
> think about the concurrency issues as multiple workers could now point
> to the same replication origin state. I think it is safe because the
> patch maintains the commit order by allowing only one process to
> commit at a time, so no two workers will be operating on the same
> origin at the same time. Even, though there is no case where the patch
> will try to advance the session's origin concurrently, it appears safe
> to do so as we change/advance the session_origin LSNs under
> replicate_state LWLock.

Right. That way, the cleanup is done only by the main apply worker.
Probably the bgworker can check if the origin is already acquired by
its (leader) main apply worker process for safety.

>
> Another idea could be that we allow multiple replication origins (one
> for each bgworker and one for the main apply worker) for the apply
> workers corresponding to a subscription. Then on restart, we can find
> the highest LSN among all the origins for a subscription. This should
> work primarily because we will maintain the commit order. Now, for
> this to work we need to somehow map all the origins for a subscription
> and one possibility is that we have a subscription id in each of the
> origin names. Currently we use ("pg_%u", MySubscription->oid) as
> origin_name. We can probably append some unique identifier number for
> each worker to allow each origin to have a subscription id. We need to
> drop all origins for a particular subscription on DROP SUBSCRIPTION. I
> think having multiple origins for the same subscription will have some
> additional work when we try to filter changes based on origin.

It also seems to work but need additional work and resource.

> The advantage of the first idea is that it won't increase the need to
> have more origins per subscription but it is quite possible that I am
> missing something and there are problems due to which we can't use
> that approach.

I prefer the first idea as it's simpler than the second one. I don't
see any concurrency problem so far unless I'm not missing something.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2022-06-01 02:06:11 RE: Build-farm - intermittent error in 031_column_list.pl
Previous Message Tom Lane 2022-06-01 00:55:29 Re: PostgreSQL Limits: maximum number of columns in SELECT result