RE: Logical replication timeout problem

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: RE: Logical replication timeout problem
Date: 2023-01-27 11:48:02
Message-ID: OS0PR01MB5716E55DEEB66EF5C37E127E94CC9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, January 25, 2023 7:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>
> On Tue, Jan 24, 2023 at 8:15 AM wangw(dot)fnst(at)fujitsu(dot)com
> <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Attach the new patch.
> >
>
> I think the patch missed to handle the case of non-transactional messages which
> was previously getting handled. I have tried to address that in the attached. Is
> there a reason that shouldn't be handled?

Thanks for updating the patch!

I thought about the non-transactional message. I think it seems fine if we
don’t handle it for timeout because such message is decoded via:

WalSndLoop
-XLogSendLogical
--LogicalDecodingProcessRecord
---logicalmsg_decode
----ReorderBufferQueueMessage
-----rb->message() -- //maybe send the message or do nothing here.

After invoking rb->message(), we will directly return to the main
loop(WalSndLoop) where we will get a chance to call
WalSndKeepaliveIfNecessary() to avoid the timeout.

This is a bit different from transactional changes, because for transactional changes, we
will buffer them and then send every buffered change one by one(via
ReorderBufferProcessTXN) without going back to the WalSndLoop, so we don't get
a chance to send keepalive message if necessary, which is more likely to cause the
timeout problem.

I will also test the non-transactional message for timeout in case I missed something.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2023-01-27 11:50:27 Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Previous Message Peter Eisentraut 2023-01-27 11:30:00 Re: [DOCS] Stats views and functions not in order?