Re: Exit walsender before confirming remote flush in logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Exit walsender before confirming remote flush in logical replication
Date: 2022-12-27 09:25:07
Message-ID: CAA4eK1L+JVwRiy9Z=7DK5B_FGb8hJcsH=nz8BvPFVg82s7zH5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 27, 2022 at 2:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Dec 27, 2022 at 1:44 PM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Thanks for checking my proposal!
> >
> > > - * Note that if we determine that there's still more data to send, this
> > > - * function will return control to the caller.
> > > + * Note that if we determine that there's still more data to send or we are in
> > > + * the physical replication more, this function will return control to the
> > > + * caller.
> > >
> > > I think in this comment you meant to say
> > >
> > > 1. "or we are in physical replication mode and all WALs are not yet replicated"
> > > 2. Typo /replication more/replication mode
> >
> > Firstly I considered 2, but I thought 1 seemed to be better.
> > PSA the updated patch.
> >
>
> I think even for logical replication we should check whether there is
> any pending WAL (via pq_is_send_pending()) to be sent. Otherwise, what
> is the point to send the done message? Also, the caller of
> WalSndDone() already has that check which is another reason why I
> can't see why you didn't have the same check in function WalSndDone().
>
> BTW, even after fixing this, I think logical replication will behave
> differently when due to some reason (like time-delayed replication)
> send buffer gets full and walsender is not able to send data. I think
> this will be less of an issue with physical replication because there
> is a separate walreceiver process to flush the WAL which doesn't wait
> but the same is not true for logical replication. Do you have any
> thoughts on this matter?
>

In logical replication, it can happen today as well without
time-delayed replication. Basically, say apply worker is waiting to
acquire some lock that is already acquired by some backend then it
will have the same behavior. I have not verified this, so you may want
to check it once.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2022-12-27 09:29:02 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Amit Kapila 2022-12-27 09:20:48 Re: Exit walsender before confirming remote flush in logical replication