Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Henry Hinze <henry(dot)hinze(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date: 2020-10-14 05:04:24
Message-ID: 4a37c0e0-88a5-5d09-19c6-390b8412d3e6@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 14/10/2020 03:12, Alvaro Herrera wrote:
> On 2020-Oct-12, Petr Jelinek wrote:
>
>>> However, and this is one reason why I'd welcome Petr/Peter thoughts on
>>> this, I don't really understand what happens in LogicalRepApplyLoop
>>> afterwards with a tablesync worker; are we actually doing anything
>>> useful there, considering that the actual data copy seems to have
>>> occurred in the CopyFrom() call in copy_table? In other words, by the
>>> time we return control to ApplyWorkerMain with a slot name, isn't the
>>> work all done, and the only thing we need is to synchronize protocol and
>>> close the connection?
>>
>> There are 2 possible states at that point, either tablesync is ahead (when
>> main apply lags or nothing is happening on publication side) or it's behind
>> the main apply. When tablesync is ahead we are indeed done and just need to
>> update the state of the table (which is what the code you removed did, but
>> LogicalRepApplyLoop should do it as well, just a bit later). When it's
>> behind we need to do catchup for that table only which still happens in the
>> tablesync worker. See the explanation at the beginning of tablesync.c, it
>> probably needs some small adjustments after the changes in your first patch.
>
> ... Ooh, things start to make some sense now. So how about the
> attached? There are some not really related cleanups. (Changes to
> protocol.sgml are still pending.)
>

It would be nice if the new sentences at the beginning of tablesync.c
started with uppercase, but that's about as nitpicky as I can be :)

> If I understand correcly, the early exit in tablesync.c is not saving *a
> lot* of time (we don't actually skip replaying any WAL), even if it's
> saving execution of a bunch of code. So I stand by my position that
> removing the code is better because it's clearer about what is actually
> happening.
>

I don't really have any problems with the simplification you propose.
The saved time is probably in order of hundreds of ms which for table
sync is insignificant.

--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message denis.patron 2020-10-14 06:47:34 Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
Previous Message Andres Freund 2020-10-14 04:35:40 Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted