Re: Perform streaming logical transactions by background workers and parallel apply

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-10-07 05:00:43
Message-ID: CAA4eK1LJoA868HCMmrzPqfcFL=bcFtAi6WWSJ0NKKQp8gspYVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 7, 2022 at 8:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Oct 6, 2022 at 9:04 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > I think the root reason for this kind of deadlock problems is the table
> > structure difference between publisher and subscriber(similar to the unique
> > difference reported earlier[1]). So, I think we'd better disallow this case. For
> > example to avoid the reported problem, we could only support parallel apply if
> > pubviaroot is false on publisher and replicated tables' types(relkind) are the
> > same between publisher and subscriber.
> >
> > Although it might restrict some use cases, but I think it only restrict the
> > cases when the partitioned table's structure is different between publisher and
> > subscriber. User can still use parallel apply for cases when the table
> > structure is the same between publisher and subscriber which seems acceptable
> > to me. And we can also document that the feature is expected to be used for the
> > case when tables' structure are the same. Thoughts ?
>
> I'm concerned that it could be a big restriction for users. Having
> different partitioned table's structures on the publisher and the
> subscriber is quite common use cases.
>
> From the feature perspective, the root cause seems to be the fact that
> the apply worker does both receiving and applying changes. Since it
> cannot receive the subsequent messages while waiting for a lock on a
> table, the parallel apply worker also cannot move forward. If we have
> a dedicated receiver process, it can off-load the messages to the
> worker while another process waiting for a lock. So I think that
> separating receiver and apply worker could be a building block for
> parallel-apply.
>

I think the disadvantage that comes to mind is the overhead of passing
messages between receiver and applier processes even for non-parallel
cases. Now, I don't think it is advisable to have separate handling
for non-parallel cases. The other thing is that we need to someway
deal with feedback messages which helps to move synchronous replicas
and update subscriber's progress which in turn helps to keep the
restart point updated. These messages also act as heartbeat messages
between walsender and walapply process.

To deal with this, one idea is that we can have two connections to
walsender process, one with walreceiver and the other with walapply
process which according to me could lead to a big increase in resource
consumption and it will bring another set of complexities in the
system. Now, in this, I think we have two possibilities, (a) The first
one is that we pass all messages to the leader apply worker and then
it decides whether to execute serially or pass it to the parallel
apply worker. However, that can again deadlock in the truncate
scenario we discussed because the main apply worker won't be able to
receive new messages once it is blocked at the truncate command. (b)
The second one is walreceiver process itself takes care of passing
streaming transactions to parallel apply workers but if we do that
then walreceiver needs to wait at the transaction end to maintain
commit order which means it can also lead to deadlock in case the
truncate happens in a streaming xact.

The other alternative is that we allow walreceiver process to wait for
apply process to finish transaction and send the feedback but that
seems to be again an overhead if we have to do it even for small
transactions, especially it can delay sync replication cases. Even, if
we don't consider overhead, it can still lead to a deadlock because
walreceiver won't be able to move in the scenario we are discussing.

About your point that having different partition structures for
publisher and subscriber, I don't know how common it will be once we
have DDL replication. Also, the default value of
publish_via_partition_root is false which doesn't seem to indicate
that this is a quite common case.

We have fixed quite a few issues in this area in the last release or
two which were found during development, so not sure if these are used
quite often in the field but it could just be a coincidence. Also, it
will only matter if there are large transactions that perform on such
tables which I don't think will be easy to predict whether those are
common or not.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-10-07 05:07:20 Re: Support logical replication of DDLs
Previous Message Amit Langote 2022-10-07 04:25:42 Re: ExecRTCheckPerms() and many prunable partitions