RE: Logical replication timeout problem

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: RE: Logical replication timeout problem
Date: 2023-01-30 09:50:08
Message-ID: OS3PR01MB62752F68953CA2598DC62D4A9ED39@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 30, 2023 at 14:55 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Mon, Jan 30, 2023 at 10:36 AM wangw(dot)fnst(at)fujitsu(dot)com
> <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Mon, Jan 30, 2023 11:37 AM Shi, Yu/侍 雨 <shiy(dot)fnst(at)cn(dot)fujitsu(dot)com>
> wrote:
> > > On Sun, Jan 29, 2023 3:41 PM wangw(dot)fnst(at)fujitsu(dot)com
> > > <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Yes, I think you are right.
> > Fixed this problem.
> >
>
> + /*
> + * Trying to send keepalive message after every change has some
> + * overhead, but testing showed there is no noticeable overhead if
> + * we do it after every ~100 changes.
> + */
> +#define CHANGES_THRESHOLD 100
> +
> + if (++changes_count < CHANGES_THRESHOLD)
> + return;
> ...
> + changes_count = 0;
>
> I think it is better to have this threshold-related code in that
> caller as we have in the previous version. Also, let's modify the
> comment as follows:"
> It is possible that the data is not sent to downstream for a long time
> either because the output plugin filtered it or there is a DDL that
> generates a lot of data that is not processed by the plugin. So, in
> such cases, the downstream can timeout. To avoid that we try to send a
> keepalive message if required. Trying to send a keepalive message
> after every change has some overhead, but testing showed there is no
> noticeable overhead if we do it after every ~100 changes."

Changed as suggested.

I also removed the comment atop the function update_progress_txn_cb_wrapper to
be consistent with the nearby *_cb_wrapper functions.

Attach the new patch.

Regards,
Wang Wei

Attachment Content-Type Size
v10-0001-Fix-the-logical-replication-timeout-during-proce.patch application/octet-stream 11.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-01-30 09:56:38 Re: Making Vars outer-join aware
Previous Message Ankit Kumar Pandey 2023-01-30 09:32:37 Re: Todo: Teach planner to evaluate multiple windows in the optimal order