RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-08-11 07:48:36
Message-ID: OS0PR01MB57162DEDD0261425B64DFFFD94649@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, August 10, 2022 11:39 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>wrote:
>
> On Tue, Aug 9, 2022 at 5:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Aug 9, 2022 at 11:09 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> wrote:
> > >
> > > Some more comments
> > >
> > > + /*
> > > + * Exit if any relation is not in the READY state and if any worker is
> > > + * handling the streaming transaction at the same time. Because for
> > > + * streaming transactions that is being applied in apply background
> > > + * worker, we cannot decide whether to apply the change for a
> relation
> > > + * that is not in the READY state (see should_apply_changes_for_rel) as
> we
> > > + * won't know remote_final_lsn by that time.
> > > + */
> > > + if (list_length(ApplyBgworkersFreeList) !=
> > > list_length(ApplyBgworkersList) &&
> > > + !AllTablesyncsReady())
> > > + {
> > > + ereport(LOG,
> > > + (errmsg("logical replication apply workers for
> > > subscription \"%s\" will restart",
> > > + MySubscription->name),
> > > + errdetail("Cannot handle streamed replication
> > > transaction by apply "
> > > + "background workers until all tables are
> > > synchronized")));
> > > +
> > > + proc_exit(0);
> > > + }
> > >
> > > How this situation can occur? I mean while starting a background
> > > worker itself we can check whether all tables are sync ready or not
> > > right?
> > >
> >
> > We are already checking at the start in apply_bgworker_can_start() but
> > I think it is required to check at the later point of time as well
> > because the new rels can be added to pg_subscription_rel via Alter
> > Subscription ... Refresh. I feel if that reasoning is correct then we
> > can probably expand comments to make it clear.
> >
> > > + /* Check the status of apply background worker if any. */
> > > + apply_bgworker_check_status();
> > > +
> > >
> > > What is the need to checking each worker status on every commit? I
> > > mean if there are a lot of small transactions along with some
> > > steamiing transactions then it will affect the apply performance for
> > > those small transactions?
> > >
> >
> > I don't think performance will be a concern because this won't do any
> > costly operation unless invalidation happens in which case it will
> > access system catalogs. However, if my above understanding is correct
> > that new tables can be added during the apply process then not sure
> > doing it at commit time is sufficient/correct because it can change
> > even during the transaction.
> >
>
> One idea that may handle it cleanly is to check for SUBREL_STATE_SYNCDONE
> state in should_apply_changes_for_rel() and error out for apply_bg_worker().
> For the SUBREL_STATE_READY state, it should return true and for any other
> state, it can return false. The one advantage of this approach could be that the
> parallel apply worker will give an error only if the corresponding transaction
> has performed any operation on the relation that has reached the SYNCDONE
> state.
> OTOH, checking at each transaction end can also lead to erroring out of
> workers even if the parallel apply transaction doesn't perform any operation on
> the relation which is not in the READY state.

I agree that it would be better to check at should_apply_changes_for_rel().

In addition, I think we should report an error if the table is not in READY state,
otherwise return true. Currently(on HEAD), if the table state is NOT READY, we
will skip all the changes related to the relation in a transaction because we
invoke process_syncing_tables only at transaction end which means the state of
table won't be changed during applying a transaction.

But while the apply bgworker is applying the streaming transaction, the
main apply worker could have applied serval normal transaction which could
change the state of table serval times(FROM INIT -> READY). So, to prevent the
risk case that we skip part of changes before state comes to READY and then
start to apply the changes after READY during on transaction, we'd better error
out if the table is not in READY state and restart without apply background
worker.

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-08-11 07:49:18 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message houzj.fnst@fujitsu.com 2022-08-11 07:47:59 RE: Perform streaming logical transactions by background workers and parallel apply