Re: [HACKERS] logical decoding of two-phase transactions

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2021-03-09 06:10:33
Message-ID: CALDaNm0bOrGYAdH6dwPeM+2=pgLg1J05HmVm7m=VH02tKbpZFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 9, 2021 at 11:01 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Mar 8, 2021 at 8:09 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 8, 2021 at 6:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> >
> > I think in case of two_phase option, replicatedPtr and sentPtr never
> > becomes the same which causes this process to hang.
> >
>
> The reason is that because on subscriber you have created a situation
> (PK violation) where it is not able to proceed with initial tablesync
> and then the apply worker is waiting for tablesync to complete, so it
> is not able to process new messages. I think as soon as you remove the
> duplicate row from the table it will be able to proceed.
>
> Now, we can see a similar situation even in HEAD without 2PC though it
> is a bit tricky to reproduce. Basically, when the tablesync worker is
> in SUBREL_STATE_CATCHUP state and it has a lot of WAL to process then
> the apply worker is just waiting for it to finish applying all the WAL
> and won't process any message. So at that time, if you try to stop the
> publisher you will see the same behavior. I have simulated a lot of
> WAL processing by manually debugging the tablesync and not proceeding
> for some time. You can also try by adding sleep after the tablesync
> worker has set the state as SUBREL_STATE_CATCHUP.
>
> So, I feel this is just an expected behavior and users need to
> manually fix the situation where tablesync worker is not able to
> proceed due to PK violation. Does this make sense?
>

Thanks for the detailed explanation, this behavior looks similar to
the issue you described, we can ignore this issue as it seems this
issue is not because of this patch. I also noticed that if we handle
the PK violation error by deleting that record which causes the PK
violation error, the server is able to stop immediately without any
issue.

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-03-09 06:21:13 Re: Removing vacuum_cleanup_index_scale_factor
Previous Message Peter Smith 2021-03-09 05:56:18 Re: Tablesync early exit