Re: Exit walsender before confirming remote flush in logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, kuroda(dot)hayato(at)fujitsu(dot)com, pgsql-hackers(at)postgresql(dot)org, ashutosh(dot)bapat(at)enterprisedb(dot)com
Subject: Re: Exit walsender before confirming remote flush in logical replication
Date: 2022-12-23 05:56:21
Message-ID: CAA4eK1LKHQKMQ=z-eB74jKxxnv3HYgwtmOutpEDvdU7Fxfet+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 23, 2022 at 7:51 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Thu, 22 Dec 2022 17:29:34 +0530, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote in
> > On Thu, Dec 22, 2022 at 11:16 AM Hayato Kuroda (Fujitsu)
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > In case of logical replication, however, we cannot support the use-case that
> > > switches the role publisher <-> subscriber. Suppose same case as above, additional
> ..
> > > Therefore, I think that we can ignore the condition for shutting down the
> > > walsender in logical replication.
> ...
> > > This change may be useful for time-delayed logical replication. The walsender
> > > waits the shutdown until all changes are applied on subscriber, even if it is
> > > delayed. This causes that publisher cannot be stopped if large delay-time is
> > > specified.
> >
> > I think the current behaviour is an artifact of using the same WAL
> > sender code for both logical and physical replication.
>
> Yeah, I don't think we do that for the reason of switchover. On the
> other hand I think the behavior was intentionally taken over since it
> is thought as sensible alone.
>

Do you see it was discussed somewhere? If so, can you please point to
that discussion?

> And I'm afraind that many people already
> relies on that behavior.
>

But OTOH, it can also be annoying for users to see some wait during
the shutdown which is actually not required.

> > I agree with you that the logical WAL sender need not wait for all the
> > WAL to be replayed downstream.
>
> Thus I feel that it might be a bit outrageous to get rid of that
> bahavior altogether because of a new feature stumbling on it. I'm
> fine doing that only in the apply_delay case, though. However, I have
> another concern that we are introducing the second exception for
> XLogSendLogical in the common path.
>
> I doubt that anyone wants to use synchronous logical replication with
> apply_delay since the sender transaction is inevitablly affected back
> by that delay.
>
> Thus how about before entering an apply_delay, logrep worker sending a
> kind of crafted feedback, which reports commit_data.end_lsn as
> flushpos? A little tweak is needed in send_feedback() but seems to
> work..
>

How can we send commit_data.end_lsn before actually committing the
xact? I think this can lead to a problem because next time (say after
restart of walsender) server can skip sending the xact even if it is
not committed by the client.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Takamichi Osumi (Fujitsu) 2022-12-23 06:03:22 RE: Support logical replication of DDLs
Previous Message houzj.fnst@fujitsu.com 2022-12-23 05:52:00 RE: Perform streaming logical transactions by background workers and parallel apply