Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Henry Hinze <henry(dot)hinze(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date: 2020-11-09 08:03:33
Message-ID: CAFiTN-s4C9+4nvhu12hxwtOi980N7+6jRJ8TP-5vPxtpXLvd0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Nov 9, 2020 at 11:50 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Nov 7, 2020 at 11:33 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Wed, Nov 4, 2020 at 10:58 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > >
> > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > where we disable streaming for the init phase. As far as I understand,
> > > for tablesync we create the initial slot during which streaming will
> > > be disabled then we will copy the table (here logical decoding won't
> > > be used) and then allow the apply worker to get any other data which
> > > is inserted in the meantime.
> >
> > I think this assumption is not completely correct, because if the
> > tablesync worker is behind the apply worker then it will start the
> > streaming by itself until it reaches from CATCHUP to SYNC DONE state.
> > So during that time if streaming is on then the tablesync worker will
> > also send the streaming on.
> >
>
> Yeah, this seems to be possible and this is the reason I mentioned
> above to dig more into this case. Did you try it via some test case? I
> think we can generate a test via debugger where after the tablesync
> worker reaches CATCHUP state stop it via debugger, then we can perform
> some large transaction on the same table which apply worker will skip
> and tablesync worker will try to apply changes and should fail.

Yeah, we can test like that. I haven't yet tested yet.

> > I think for disabling the streaming in
> > the tablesync worker we can do something like this.
> >
>
> Sure, but why do we want to prohibit streaming in tablesync worker
> unless there is some fundamental reason for the same? If we can write
> a test based on what I described above then we can probably know if
> there is any real issue with allowing streaming via tablesync worker.

I think there is no fundamental reason for the same, but I thought it
is just an initial catchup state so does it make sense to stream the
transaction. But if we want to stream then we need to put some
handling in apply_handle_stream_commit so that it can avoid committing
if this is the table-sync worker.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-11-09 09:13:10 BUG #16706: insert into on conflict(pk) do update error violates not-null constraint
Previous Message Amit Kapila 2020-11-09 06:21:15 Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop