RE: Perform streaming logical transactions by background workers and parallel apply

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-12-28 04:38:58
Message-ID: OS3PR01MB62756CC6FFBDD431F2878ACE9EF29@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 27, 2022 19:37 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, Dec 27, 2022 at 10:24 AM wangw(dot)fnst(at)fujitsu(dot)com
> <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Attach the new version patch which addressed all above comments and part
> of
> > comments from [1] except one comment that are being discussed.
> >

Thanks for your comments.

> 1.
> +# Test that the deadlock is detected among leader and parallel apply workers.
> +
> +$node_subscriber->append_conf('postgresql.conf', "deadlock_timeout =
> 1ms");
> +$node_subscriber->reload;
> +
>
> A. I see that the other existing tests have deadlock_timeout set as
> 10ms, 100ms, 100s, etc. Is there a reason to keep so low here? Shall
> we keep it as 10ms?

No, I think you are right. Keep it as 10ms.

> B. /among leader/among the leader

Fixed.

> 2. Can we leave having tests in 022_twophase_cascade to be covered by
> parallel mode? The two-phase and parallel apply will be covered by
> 023_twophase_stream, so not sure if we get any extra coverage by
> 022_twophase_cascade.

Compared with 023_twophase_stream, there is "rollback a subtransaction" in
022_twophase_cascade, but since this part of the code can be covered by tests
in 018_stream_subxact_abort, I think we can remove parallel version for
022_twophase_cascade. So I reverted changes in 022_twophase_cascade for
parallel mode and added some comments atop this file.

> 3. Let's combine 0001 and 0002 as both have got reviewed independently.

Combined them into one patch.

And I also checked and merged the diff patch in [1].

Besides, also fixed the below problem:
In previous versions, we didn't wait for STREAM_ABORT transactions to complete.
But in extreme cases, this can cause problems if the STREAM_ABORT transaction
doesn't complete and xid wraparound occurs on the publisher-side. Fixed this by
waiting for the STREAM_ABORT transaction to complete.

Attach the new patch set.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2B5gTjHzWovkbUj%2BxsQ9yO9jVcKsS-3c5ZXLFy8JmfT%3DA%40mail.gmail.com

Regards,
Wang wei

Attachment Content-Type Size
v69-0001-Perform-streaming-logical-transactions-by-parall.patch application/octet-stream 261.7 KB
v69-0002-Add-GUC-stream_serialize_threshold-and-test-seri.patch application/octet-stream 12.4 KB
v69-0003-Stop-extra-worker-if-GUC-was-changed.patch application/octet-stream 4.5 KB
v69-0004-Retry-to-apply-streaming-xact-only-in-apply-work.patch application/octet-stream 21.1 KB
v69-0005-Add-a-main_worker_pid-to-pg_stat_subscription.patch application/octet-stream 9.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-12-28 05:22:38 CFM for 2023-01
Previous Message Amit Kapila 2022-12-28 04:33:09 Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION