Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Henry Hinze <henry(dot)hinze(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date: 2020-10-15 14:50:19
Message-ID: 20201015145019.GA4779@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2020-Oct-14, Petr Jelinek wrote:

> It would be nice if the new sentences at the beginning of tablesync.c
> started with uppercase, but that's about as nitpicky as I can be :)

OK, fixed that :-) And pushed (to master only). There's one more
change I added at the last minute, which is to remove the 'missing_ok'
parameter of GetSubscriptionRelState. There are some other cosmetic
changes, but nothing of substance.

> > If I understand correcly, the early exit in tablesync.c is not saving *a
> > lot* of time (we don't actually skip replaying any WAL), even if it's
> > saving execution of a bunch of code. So I stand by my position that
> > removing the code is better because it's clearer about what is actually
> > happening.
>
> I don't really have any problems with the simplification you propose. The
> saved time is probably in order of hundreds of ms which for table sync is
> insignificant.

Great, thanks.

If you think this is done ... I have taken a few notes in the process:

* STREAM COMMIT bug?
In apply_handle_stream_commit, we do CommitTransactionCommand, but
apparently in a tablesync worker we shouldn't do it.
* process_syncing_tables_for_apply belongs not in tablesync.c (maybe
worker.c?)
* DropSubscription bug / logicalrep_find_workers
DropSubscription believes that it's valid to get a list of workers, then
kill them one by one. But this is racy since one worker could end for other
reasons and be replaced by a worker to another subscription. This code
should use a loop that restarts after killing each worker.
* subscription_change_cb too coarse
It seems that we should only reset "subscription valid" when *our*
subscription is changed, not other subscriptions. Otherwise we read our own
sub data far too often.
* PostgresNode uses print where it should use Test::More's note()

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2020-10-15 15:15:57 Re: Bugreport | Logical replication PostgreSQL 12 | Error after disable / enable replication
Previous Message Tom Lane 2020-10-15 14:09:31 Re: BUG #16673: Stack depth limit exceeded error while running sysbench TPC-C