Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "smithpb2250(at)gmail(dot)com" <smithpb2250(at)gmail(dot)com>
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-12-16 03:51:10
Message-ID: CAA4eK1LyetktcphdRrufHac4t5DGyhsS2xG2DSOGb7OaOVcDVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 15, 2022 at 1:42 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Horiguchi-san, Amit,
>
> > > Yes, that would be ideal. But do you know why that is a must?
> >
> > I believe a graceful shutdown (fast and smart) of a replication set is expected to
> > be in sync. Of course we can change the policy to allow walsnder to stop before
> > confirming all WAL have been applied. However walsender doesn't have an idea
> > of wheter the peer is intentionally delaying or not.
>
> This mechanism was introduced by 985bd7[1], which was needed to support a
> "clean" switchover. I think it is needed for physical replication, but it is not
> clear for the logical case.
>
> When the postmaster is stopped in fast or smart mode, we expected that all
> modifications were received by secondary. This requirement seems to be not changed
> from the initial commit.
>
> Before 985bd7, the walsender exited just after sending the final WAL, which meant
> that sometimes the last packet could not reach to secondary. So there was a possibility
> of failing to reboot the primary as a new secondary because the new primary does
> not have the last WAL record. To avoid the above walsender started waiting for
> flush before exiting.
>
> But in the case of logical replication, I'm not sure whether this limitation is
> really needed or not. I think it may be OK that walsender exits without waiting,
> in case of delaying applies. Because we don't have to consider the above issue
> for logical replication.
>

I also don't see the need for this mechanism for logical replication,
and in fact, why do we need to even wait for sending the existing WAL?

I think the reason why we don't need to wait for logical replication
is that after the restart, we always start sending WAL from the
location requested by the subscriber, or till the point where the
publisher knows the confirmed flush location of the subscriber.
Consider another case where after restart publisher (node-1) wants to
act as a subscriber for the previous subscriber (node-2). Now, the new
subscriber (node-1) won't have a way to tell the new publisher
(node-2) that starts from the location where the node-1 went down as
WAL locations between publisher and subscriber need not be same.

This brings us to the question of whether users can use logical
replication for the scenario where they want the old master to follow
the new master after the restart which we typically do in physical
replication, if so how?

Another related point to consider is what is the behavior of
synchronous replication when shutdown has been performed both in the
case of physical and logical replication especially when the
time-delayed replication feature is enabled?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-12-16 04:13:07 Re: New strategies for freezing, advancing relfrozenxid early
Previous Message Japin Li 2022-12-16 03:43:36 Typo macro name on FreeBSD?