RE: Logical replication timeout problem

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: RE: Logical replication timeout problem
Date: 2023-01-29 07:41:07
Message-ID: OS3PR01MB62751097A6FA5F3187A0EEFA9ED29@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 27, 2023 at 19:55 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Jan 27, 2023 at 5:18 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Wednesday, January 25, 2023 7:26 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com>
> > >
> > > On Tue, Jan 24, 2023 at 8:15 AM wangw(dot)fnst(at)fujitsu(dot)com
> > > <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > Attach the new patch.
> > > >
> > >
> > > I think the patch missed to handle the case of non-transactional messages
> which
> > > was previously getting handled. I have tried to address that in the attached.
> Is
> > > there a reason that shouldn't be handled?
> >
> > Thanks for updating the patch!
> >
> > I thought about the non-transactional message. I think it seems fine if we
> > don’t handle it for timeout because such message is decoded via:
> >
> > WalSndLoop
> > -XLogSendLogical
> > --LogicalDecodingProcessRecord
> > ---logicalmsg_decode
> > ----ReorderBufferQueueMessage
> > -----rb->message() -- //maybe send the message or do nothing here.
> >
> > After invoking rb->message(), we will directly return to the main
> > loop(WalSndLoop) where we will get a chance to call
> > WalSndKeepaliveIfNecessary() to avoid the timeout.
> >
>
> Valid point. But this means the previous handling of non-transactional
> messages was also redundant.

Thanks for the analysis, I think it makes sense. So I removed the handling of
non-transactional messages.

> > This is a bit different from transactional changes, because for transactional
> changes, we
> > will buffer them and then send every buffered change one by one(via
> > ReorderBufferProcessTXN) without going back to the WalSndLoop, so we
> don't get
> > a chance to send keepalive message if necessary, which is more likely to cause
> the
> > timeout problem.
> >
> > I will also test the non-transactional message for timeout in case I missed
> something.
> >
>
> Okay, thanks. Please see if we can test a mix of transactional and
> non-transactional messages as well.

I tested a mix transaction of transactional and non-transactional messages on
the current HEAD and reproduced the timeout problem. I think this result is OK.
Because when decoding a transaction, non-transactional changes are processed
directly and the function WalSndKeepaliveIfNecessary is called, while
transactional changes are cached and processed after decoding. After decoding,
only transactional changes will be processed (in the function
ReorderBufferProcessTXN), so the timeout problem will still be reproduced.

After applying the v8 patch, the test mentioned above didn't reproduce the
timeout problem (Attach this test script 'test_with_nontransactional.sh').

Attach the new patch.

Regards,
Wang Wei

Attachment Content-Type Size
v8-0001-Fix-the-logical-replication-timeout-during-proces.patch application/octet-stream 11.4 KB
test_with_nontransactional.sh application/octet-stream 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2023-01-29 09:57:39 Bug #17759: GENERATED columns not computed during MERGE
Previous Message Tom Lane 2023-01-29 04:22:19 Re: run pgindent on a regular basis / scripted manner