Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Henry Hinze <henry(dot)hinze(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date: 2020-11-23 11:57:33
Message-ID: CAFiTN-tpeZ5PjYZ4KgyY_LU1KeJZo-o6=d-TVgyq5bsCRs9RgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Nov 23, 2020 at 3:13 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Nov 23, 2020 at 10:51 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Sat, Nov 21, 2020 at 12:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > 2.
> > > @@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s)
> > > {
> > > /* Cleanup the subxact info */
> > > cleanup_subxact_info();
> > > - CommitTransactionCommand();
> > > +
> > > + if (!am_tablesync_worker())
> > > + CommitTransactionCommand();
> > >
> > > Here, also you can add a comment: "/* The synchronization worker runs
> > > in single transaction. */"
> > >
> >
> > Done
> >
>
> Okay, thanks. I have slightly changed the comments and moved the newly
> added function in the attached patch.

Okay, looks good to me.

I have tested the reported
> scenario and additionally verified that the fix is good even if the
> tablesync worker processed the partial transaction due to streaming.
> This won't do any harm because later apply worker will replay the
> entire transaction. This could be a problem if the apply worker also
> tries to stream the transaction between the SUBREL_STATE_CATCHUP and
> SUBREL_STATE_SYNCDONE state because then apply worker might have
> skipped applying the partial transactions processed by tablesync
> worker. But, I have checked that the apply worker waits for sync
> worker to complete its processing between these two states.

Right

See
> process_syncing_tables_for_apply. Does this make sense?

Yes, it makes sense to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2020-11-23 12:57:35 Re: segfault with incremental sort
Previous Message Amit Kapila 2020-11-23 09:44:23 Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop