Re: Perform streaming logical transactions by background workers and parallel apply

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2023-01-05 11:53:36
Message-ID: CAFiTN-srQ2qZsx2uK6egT_jz9weBGm-Ff+gE8ObMu3sWYNKpFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 5, 2023 at 5:03 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thursday, January 5, 2023 4:22 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >

> Thanks for reporting the problem.
>
> After analyzing the behavior, I think it's a bug on publisher side which
> is not directly related to parallel apply.
>
> I think the root reason is that we didn't try to send a stream end(stream
> abort) message to subscriber for the crashed transaction which was streamed
> before.
> The behavior is that, after restarting, the publisher will start to decode the
> transaction that aborted due to crash, and when try to stream the first change
> of that transaction, it will send a stream start message but then it realizes
> that the transaction was aborted, so it will enter the PG_CATCH block of
> ReorderBufferProcessTXN() and call ReorderBufferResetTXN() which send the
> stream stop message. And in this case, there would be a parallel apply worker
> started on subscriber waiting for stream end message which will never come.

I suspected it but didn't analyze this.

> I think the same behavior happens for the non-parallel mode which will cause
> a stream file left on subscriber and will not be cleaned until the apply worker is
> restarted.
> To fix it, I think we need to send a stream abort message when we are cleaning
> up crashed transaction on publisher(e.g., in ReorderBufferAbortOld()). And here
> is a tiny patch which change the same. I have confirmed that the bug is fixed
> and all regression tests pass.
>
> What do you think ?
> I will start a new thread and try to write a testcase if possible
> after reaching a consensus.

I think your analysis looks correct and we can raise this in a new thread.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2023-01-05 12:02:12 Re: Logical replication - schema change not invalidating the relation cache
Previous Message houzj.fnst@fujitsu.com 2023-01-05 11:33:08 RE: Perform streaming logical transactions by background workers and parallel apply