Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Henry Hinze <henry(dot)hinze(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date: 2020-11-23 09:44:23
Message-ID: CAA4eK1KhU1eCrOBCBMUXOEspzd0Jza7+jEuNyG-h28kzvvruHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Nov 23, 2020 at 10:51 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Sat, Nov 21, 2020 at 12:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > 2.
> > @@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s)
> > {
> > /* Cleanup the subxact info */
> > cleanup_subxact_info();
> > - CommitTransactionCommand();
> > +
> > + if (!am_tablesync_worker())
> > + CommitTransactionCommand();
> >
> > Here, also you can add a comment: "/* The synchronization worker runs
> > in single transaction. */"
> >
>
> Done
>

Okay, thanks. I have slightly changed the comments and moved the newly
added function in the attached patch. I have tested the reported
scenario and additionally verified that the fix is good even if the
tablesync worker processed the partial transaction due to streaming.
This won't do any harm because later apply worker will replay the
entire transaction. This could be a problem if the apply worker also
tries to stream the transaction between the SUBREL_STATE_CATCHUP and
SUBREL_STATE_SYNCDONE state because then apply worker might have
skipped applying the partial transactions processed by tablesync
worker. But, I have checked that the apply worker waits for sync
worker to complete its processing between these two states. See
process_syncing_tables_for_apply. Does this make sense?

Peter, can you also please once test the attached and see if this
fixes the problem for you as well?

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
v3-0001-Fix-replication-of-in-progress-transactions-in-ta.patch application/octet-stream 5.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dilip Kumar 2020-11-23 11:57:33 Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Previous Message PG Bug reporting form 2020-11-23 08:41:07 BUG #16739: Temporary files not deleting from data folder on disk