Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-12-14 10:59:45
Message-ID: CAA4eK1KcJQCyX=sVLNDj=opU=8VbnxFdEiEvAV_OGEzBravUYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 9, 2022 at 10:49 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Hi Vignesh,
>
> > In the case of physical replication by setting
> > recovery_min_apply_delay, I noticed that both primary and standby
> > nodes were getting stopped successfully immediately after the stop
> > server command. In case of logical replication, stop server fails:
> > pg_ctl -D publisher -l publisher.log stop -c
> > waiting for server to shut
> > down...............................................................
> > failed
> > pg_ctl: server does not shut down
> >
> > In case of logical replication, the server does not get stopped
> > because the walsender process is not able to exit:
> > ps ux | grep walsender
> > vignesh 1950789 75.3 0.0 8695216 22284 ? Rs 11:51 1:08
> > postgres: walsender vignesh [local] START_REPLICATION
>
> Thanks for reporting the issue. I analyzed about it.
>
>
> This issue has occurred because the apply worker cannot reply during the delay.
> I think we may have to modify the mechanism that delays applying transactions.
>
> When walsender processes are requested to shut down, it can shut down only after
> that all the sent WALs are replicated on the subscriber. This check is done in
> WalSndDone(), and the replicated position will be updated when processes handle
> the reply messages from a subscriber, in ProcessStandbyReplyMessage().
>
> In the case of physical replication, the walreciever can receive WALs and reply
> even if the application is delayed. It means that the replicated position will
> be transported to the publisher side immediately. So the walsender can exit.
>

I think it is not only the replicated positions but it also checks if
there is any pending send in WalSndDone(). Why is it a must to send
all pending WAL and confirm that it is flushed on standby before the
shutdown for physical standby? Is it because otherwise, we may lose
the required WAL? I am asking because it is better to see if those
conditions apply to logical replication as well.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-12-14 11:00:28 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Hayato Kuroda (Fujitsu) 2022-12-14 10:46:17 RE: Time delayed LR (WAS Re: logical replication restrictions)