RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-11-03 13:06:35
Message-ID: OS0PR01MB5716410E7551394B730D433294389@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, November 2, 2022 10:50 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Oct 24, 2022 at 8:42 PM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Oct 12, 2022 at 3:04 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > On Tue, Oct 11, 2022 at 5:52 AM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Oct 7, 2022 at 2:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > > > >
> > > > > About your point that having different partition structures for
> > > > > publisher and subscriber, I don't know how common it will be once we
> > > > > have DDL replication. Also, the default value of
> > > > > publish_via_partition_root is false which doesn't seem to indicate
> > > > > that this is a quite common case.
> > > >
> > > > So how can we consider these concurrent issues that could happen only
> > > > when streaming = 'parallel'? Can we restrict some use cases to avoid
> > > > the problem or can we have a safeguard against these conflicts?
> > > >
> > >
> > > Yeah, right now the strategy is to disallow parallel apply for such
> > > cases as you can see in *0003* patch.
> >
> > Tightening the restrictions could work in some cases but there might
> > still be coner cases and it could reduce the usability. I'm not really
> > sure that we can ensure such a deadlock won't happen with the current
> > restrictions. I think we need something safeguard just in case. For
> > example, if the leader apply worker is waiting for a lock acquired by
> > its parallel worker, it cancels the parallel worker's transaction,
> > commits its transaction, and restarts logical replication. Or the
> > leader can log the deadlock to let the user know.
> >
>
> As another direction, we could make the parallel apply feature robust
> if we can detect deadlocks that happen among the leader worker and
> parallel workers. I'd like to summarize the idea discussed off-list
> (with Amit, Hou-San, and Kuroda-San) for discussion. The basic idea is
> that when the leader worker or parallel worker needs to wait for
> something (eg. transaction completion, messages) we use lmgr
> functionality so that we can create wait-for edges and detect
> deadlocks in lmgr.
>
> For example, a scenario where a deadlock occurs is the following:
>
> [Publisher]
> create table tab1(a int);
> create publication pub for table tab1;
>
> [Subcriber]
> creat table tab1(a int primary key);
> create subscription sub connection 'port=10000 dbname=postgres'
> publication pub with (streaming = parallel);
>
> TX1:
> BEGIN;
> INSERT INTO tab1 SELECT i FROM generate_series(1, 5000) s(i); -- streamed
> Tx2:
> BEGIN;
> INSERT INTO tab1 SELECT i FROM generate_series(1, 5000) s(i); -- streamed
> COMMIT;
> COMMIT;
>
> Suppose a parallel apply worker (PA-1) is executing TX-1 and the
> leader apply worker (LA) is executing TX-2 concurrently on the
> subscriber. Now, LA is waiting for PA-1 because of the unique key of
> tab1 while PA-1 is waiting for LA to send further messages. There is a
> deadlock between PA-1 and LA but lmgr cannot detect it.
>
> One idea to resolve this issue is that we have LA acquire a session
> lock on a shared object (by LockSharedObjectForSession()) and have
> PA-1 wait on the lock before trying to receive messages. IOW, LA
> acquires the lock before sending STREAM_STOP and releases it if
> already acquired before sending STREAM_START, STREAM_PREPARE and
> STREAM_COMMIT. For PA-1, it always needs to acquire the lock after
> processing STREAM_STOP and then release immediately after acquiring
> it. That way, when PA-1 is waiting for LA, we can have a wait-edge
> from PA-1 to LA in lmgr, which will make a deadlock in lmgr like:
>
> LA (waiting to acquire lock) -> PA-1 (waiting to acquire the shared
> object) -> LA
>
> We would need the shared objects per parallel apply worker.
>
> After detecting a deadlock, we can restart logical replication with
> temporarily disabling the parallel apply, which is done by 0005 patch.
>
> Another scenario is similar to the previous case but TX-1 and TX-2 are
> executed by two parallel apply workers (PA-1 and PA-2 respectively).
> In this scenario, PA-2 is waiting for PA-1 to complete its transaction
> while PA-1 is waiting for subsequent input from LA. Also, LA is
> waiting for PA-2 to complete its transaction in order to preserve the
> commit order. There is a deadlock among three processes but it cannot
> be detected in lmgr because the fact that LA is waiting for PA-2 to
> complete its transaction doesn't appear in lmgr (see
> parallel_apply_wait_for_xact_finish()). To fix it, we can use
> XactLockTableWait() instead.
>
> However, since XactLockTableWait() considers PREPARED TRANSACTION as
> still in progress, probably we need a similar trick as above in case
> where a transaction is prepared. For example, suppose that TX-2 was
> prepared instead of committed in the above scenario, PA-2 acquires
> another shared lock at START_STREAM and releases it at
> STREAM_COMMIT/PREPARE. LA can wait on the lock.
>
> Yet another scenario where LA has to wait is the case where the shm_mq
> buffer is full. In the above scenario (ie. PA-1 and PA-2 are executing
> transactions concurrently), if the shm_mq buffer between LA and PA-2
> is full, LA has to wait to send messages, and this wait doesn't appear
> in lmgr. To fix it, probably we have to use non-blocking write and
> wait with a timeout. If timeout is exceeded, the LA will write to file
> and indicate PA-2 that it needs to read file for remaining messages.
> Then LA will start waiting for commit which will detect deadlock if
> any.
>
> If we can detect deadlocks by having such a functionality or some
> other way then we don't need to tighten the restrictions of subscribed
> tables' schemas etc.

Thanks for the analysis and summary !

I tried to implement the above idea and here is the patch set. I have done some
basic tests for the new codes and it work fine. But I am going to test some
conner cases to make sure all the codes work fine. I removed the old 0003 patch
which was used to check the parallel apply safety because now we can detect the
deadlock problem.

Besides, there are few tasks left which I will handle soon and update the patch set:

* Address previous comment from Amit[1], Shi-san[2] and Peter[3] (Already done but haven't merged them).
* Rebase the original 0005 patch which is "retry to apply streaming xact only in leader apply worker".
* Adjust some comments and documentation related to new codes.

[1] https://www.postgresql.org/message-id/CAA4eK1Lsn%3D_gz1-3LqZ-wEDQDmChUsOX8LvHS8WV39wC1iRR%3DQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/OSZPR01MB631042582805A8E8615BC413FD329%40OSZPR01MB6310.jpnprd01.prod.outlook.com
[3] https://www.postgresql.org/message-id/CAHut%2BPsJWHRoRzXtMrJ1RaxmkS2LkiMR_4S2pSionxXmYsyOww%40mail.gmail.com

Best regards,
Hou zj

Attachment Content-Type Size
v42-0001-Perform-streaming-logical-transactions-by-parall.patch application/octet-stream 161.3 KB
v42-0002-Test-streaming-parallel-option-in-tap-test.patch application/octet-stream 74.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-11-03 13:23:00 Re: real/float example for testlibpq3
Previous Message John Naylor 2022-11-03 12:19:07 Re: Incorrect include file order in guc-file.l