RE: Perform streaming logical transactions by background workers and parallel apply

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-04-14 03:42:39
Message-ID: OS0PR01MB57166BC3CAA873364A2CC07A94EF9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, April 8, 2022 5:14 PM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> On Wednesday, April 6, 2022 1:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>
> > In this email, I would like to discuss allowing streaming logical
> > transactions (large in-progress transactions) by background workers
> > and parallel apply in general. The goal of this work is to improve the
> > performance of the apply work in logical replication.
> >
> > Currently, for large transactions, the publisher sends the data in
> > multiple streams (changes divided into chunks depending upon
> > logical_decoding_work_mem), and then on the subscriber-side, the apply
> > worker writes the changes into temporary files and once it receives
> > the commit, it read from the file and apply the entire transaction. To
> > improve the performance of such transactions, we can instead allow
> > them to be applied via background workers. There could be multiple
> > ways to achieve this:
> >
> > Approach-1: Assign a new bgworker (if available) as soon as the xact's
> > first stream came and the main apply worker will send changes to this
> > new worker via shared memory. We keep this worker assigned till the
> > transaction commit came and also wait for the worker to finish at
> > commit. This preserves commit ordering and avoid writing to and
> > reading from file in most cases. We still need to spill if there is no
> > worker available. We also need to allow stream_stop to complete by the
> > background worker to finish it to avoid deadlocks because T-1's
> > current stream of changes can update rows in conflicting order with
> > T-2's next stream of changes.
> >
>
> Attach the POC patch for the Approach-1 of "Perform streaming logical
> transactions by background workers". The patch is still a WIP patch as
> there are serval TODO items left, including:
>
> * error handling for bgworker
> * support for SKIP the transaction in bgworker
> * handle the case when there is no more worker available
> (might need spill the data to the temp file in this case)
> * some potential bugs
>
> The original patch is borrowed from an old thread[1] and was rebased and
> extended/cleaned by me. Comments and suggestions are welcome.

Attach a new version patch which improved the error handling and handled the case
when there is no more worker available (will spill the data to the temp file in this case).

Currently, it still doesn't support skip the streamed transaction in bgworker, because
in this approach, we don't know the last lsn for the streamed transaction being applied,
so cannot get the lsn to SKIP. I will think more about it and keep testing the patch.

Best regards,
Hou zj

Attachment Content-Type Size
v2-0001-Perform-streaming-logical-transactions-by-background.patch application/octet-stream 62.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-04-14 05:03:50 Re: Printing backtrace of postgres processes
Previous Message Amit Kapila 2022-04-14 03:39:31 Re: Column Filtering in Logical Replication