RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'vignesh C' <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-11-24 15:21:58
Message-ID: TYCPR01MB837393BA666815647F35DDF8ED0F9@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tuesday, November 22, 2022 6:15 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> On Mon, 14 Nov 2022 at 12:14, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > The thread title doesn't really convey the topic under discussion, so
> > changed it. IIRC, this has been mentioned by others as well in the
> > thread.
> >
> > On Sat, Nov 12, 2022 at 7:21 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > Few comments:
> > > 1) I feel if the user has specified a long delay there is a chance
> > > that replication may not continue if the replication slot falls
> > > behind the current LSN by more than max_slot_wal_keep_size. I feel
> > > we should add this reference in the documentation of min_apply_delay
> > > as the replication will not continue in this case.
> > >
> >
> > This makes sense to me.
Modified accordingly. The updated patch is in [1].

> >
> > > 2) I also noticed that if we have to shut down the publisher server
> > > with a long min_apply_delay configuration, the publisher server
> > > cannot be stopped as the walsender waits for the data to be
> > > replicated. Is this behavior ok for the server to wait in this case?
> > > If this behavior is ok, we could add a log message as it is not very
> > > evident from the log files why the server could not be shut down.
> > >
> >
> > I think for this case, the behavior should be the same as for physical
> > replication. Can you please check what is behavior for the case you
> > are worried about in physical replication? Note, we already have a
> > similar parameter for recovery_min_apply_delay for physical
> > replication.
>
> In the case of physical replication by setting recovery_min_apply_delay, I
> noticed that both primary and standby nodes were getting stopped successfully
> immediately after the stop server command. In case of logical replication, stop
> server fails:
> pg_ctl -D publisher -l publisher.log stop -c waiting for server to shut
> down...............................................................
> failed
> pg_ctl: server does not shut down
>
> In case of logical replication, the server does not get stopped because the
> walsender process is not able to exit:
> ps ux | grep walsender
> vignesh 1950789 75.3 0.0 8695216 22284 ? Rs 11:51 1:08
> postgres: walsender vignesh [local] START_REPLICATION
Thanks, I could reproduce this and I'll update this point in a subsequent version.

[1] - https://www.postgresql.org/message-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9%40TYCPR01MB8373.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-11-24 15:22:16 Re: Patch: Global Unique Index
Previous Message Takamichi Osumi (Fujitsu) 2022-11-24 15:18:34 Re: Time delayed LR (WAS Re: logical replication restrictions)