RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-08-02 11:46:21
Message-ID: OS0PR01MB5716CE3888304E1369354914949D9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, July 27, 2022 4:22 PM houzj(dot)fnst(at)fujitsu(dot)com wrote:
>
> On Tuesday, July 26, 2022 5:34 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> wrote:
>
> > 3.
> > Why are we restricting parallel apply workers only for the streamed
> > transactions, because streaming depends upon the size of the logical
> > decoding work mem so making steaming and parallel apply tightly
> > coupled seems too restrictive to me. Do we see some obvious problems
> > in applying other transactions in parallel?
>
> We thought there could be some conflict failure and deadlock if we parallel
> apply normal transaction which need transaction dependency check[1]. But I
> will do some more research for this and share the result soon.

After thinking about this, I confirmed that it would be easy to cause deadlock
error if we don't have additional dependency analysis and COMMIT order preserve
handling for parallel apply normal transaction.

Because the basic idea to parallel apply normal transaction in the first
version is that: the main apply worker will receive data from pub and pass them
to apply bgworker without applying by itself. And only before the apply
bgworker apply the final COMMIT command, it need to wait for any previous
transaction to finish to preserve the commit order. It means we could pass the
next transaction's data to another apply bgworker before the previous
transaction is committed in the first apply bgworker.

In this approach, we have to do the dependency analysis because it's easy to
cause dead lock error when applying DMLs in parallel(See the attachment for the
examples where the dead lock could happen). So, it's a bit different from
streaming transaction.

We could apply the next transaction only after the first transaction is
committed in which approach we don't need the dependency analysis, but it would
not bring noticeable performance improvement even if we start serval apply
workers to do that because the actual DMLs are not performed in parallel.

Based on above, we plan to first introduce the patch to perform streaming
logical transactions by background workers, and then introduce parallel apply
normal transaction which design is different and need some additional handling.

Best regards,
Hou zj

> [1]
> https://www.postgresql.org/message-id/CAA4eK1%2BwyN6zpaHUkCLorEW
> Nx75MG0xhMwcFhvjqm2KURZEAGw%40mail.gmail.com

Attachment Content-Type Size
deadlock_example.txt text/plain 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2022-08-02 11:55:59 Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)
Previous Message Andrey Borodin 2022-08-02 11:32:39 Re: Slow standby snapshot