Re: Logical replication timeout problem

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2023-01-27 11:54:54
Message-ID: CAA4eK1+FSpedbCpsOFws4XdQubjMcMxFDHFeup54He7xcCBxWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 27, 2023 at 5:18 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wednesday, January 25, 2023 7:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> >
> > On Tue, Jan 24, 2023 at 8:15 AM wangw(dot)fnst(at)fujitsu(dot)com
> > <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > Attach the new patch.
> > >
> >
> > I think the patch missed to handle the case of non-transactional messages which
> > was previously getting handled. I have tried to address that in the attached. Is
> > there a reason that shouldn't be handled?
>
> Thanks for updating the patch!
>
> I thought about the non-transactional message. I think it seems fine if we
> don’t handle it for timeout because such message is decoded via:
>
> WalSndLoop
> -XLogSendLogical
> --LogicalDecodingProcessRecord
> ---logicalmsg_decode
> ----ReorderBufferQueueMessage
> -----rb->message() -- //maybe send the message or do nothing here.
>
> After invoking rb->message(), we will directly return to the main
> loop(WalSndLoop) where we will get a chance to call
> WalSndKeepaliveIfNecessary() to avoid the timeout.
>

Valid point. But this means the previous handling of non-transactional
messages was also redundant.

> This is a bit different from transactional changes, because for transactional changes, we
> will buffer them and then send every buffered change one by one(via
> ReorderBufferProcessTXN) without going back to the WalSndLoop, so we don't get
> a chance to send keepalive message if necessary, which is more likely to cause the
> timeout problem.
>
> I will also test the non-transactional message for timeout in case I missed something.
>

Okay, thanks. Please see if we can test a mix of transactional and
non-transactional messages as well.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vigneshk Kvignesh 2023-01-27 11:57:40 Re: PG11 to PG14 Migration Slowness
Previous Message Jelte Fennema 2023-01-27 11:50:27 Re: [EXTERNAL] Re: Add non-blocking version of PQcancel