Re: Perform streaming logical transactions by background workers and parallel apply

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-12-09 07:14:05
Message-ID: CAA4eK1LJMpBYbgDVz=g4qig8C2bM10PK=DgW7o1zn8X426vJ+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 8, 2022 at 12:37 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>

Review comments
==============
1. Currently, we don't release the stream lock in LA (leade apply
worker) for "rollback to savepoint" and the reason is mentioned in
comments of apply_handle_stream_abort() in the patch. But, today,
while testing, I found that can lead to deadlock which otherwise,
won't happen on the publisher. The key point is rollback to savepoint
releases the locks acquired by the particular subtransaction, so
parallel apply worker should also do the same. Consider the following
example where the transaction in session-1 is being performed by the
parallel apply worker and the transaction in session-2 is being
performed by the leader apply worker. I have simulated it by using GUC
force_stream_mode.

Publisher
==========
Session-1
postgres=# begin;
BEGIN
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# truncate t1;
TRUNCATE TABLE

Session-2
postgres=# begin;
BEGIN
postgres=*# insert into t1 values(4);

Session-1
postgres=*# rollback to savepoint s1;
ROLLBACK

Session-2
Commit;

With or without commit of Session-2, this scenario will lead to
deadlock on the subscriber because PA (parallel apply worker) is
waiting for LA to send the next command, and LA is blocked by
Exclusive of PA. There is no deadlock on the publisher because
rollback to savepoint will release the lock acquired by truncate.

To solve this, How about if we do three things before sending abort of
sub-transaction (a) unlock the stream lock, (b) increment
pending_stream_count, (c) take the stream lock again?

Now, if the PA is not already waiting on the stop, it will not wait at
stream_stop but will wait after applying abort of sub-transaction and
if it is already waiting at stream_stop, the wait will be released. If
this works then probably we should try to do (b) before (a) to match
the steps with stream_start.

2. There seems to be another general problem in the way the patch
waits for stream_stop in PA (parallel apply worker). Currently, PA
checks, if there are no more pending streams then it tries to wait for
the next stream by waiting on a stream lock. However, it is possible
after PA checks there is no pending stream and before it actually
starts waiting on a lock, the LA sends another stream for which even
stream_stop is sent, in this case, PA will start waiting for the next
stream whereas there is actually a pending stream available. In this
case, it won't lead to any problem apart from delay in applying the
changes in such cases but for the case mentioned in the previous point
(Pont 1), it can lead to deadlock even after we implement the solution
proposed to solve it.

3. The other point to consider is that for
stream_commit/prepare/abort, in LA, we release the stream lock after
sending the message whereas for stream_start we release it before
sending the message. I think for the earlier cases
(stream_commit/prepare/abort), the patch has done like this because
pa_send_data() may need to require the lock again when it times out
and start serializing, so there will be no sense in first releasing
it, then re-acquiring it, and then again releasing it. Can't we also
release the lock for stream_start after pa_send_data() only if it is
not switched to serialize mode?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-12-09 07:46:58 static assert cleanup
Previous Message Amit Langote 2022-12-09 06:50:05 Re: on placeholder entries in view rule action query's range table