RE: Logical replication timeout problem

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: RE: Logical replication timeout problem
Date: 2022-04-18 06:19:15
Message-ID: OS3PR01MB62754D7C91CE80B3A68FFC1A9EF39@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thur, Apr 14, 2022 at 8:21 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
Thanks for your comments.

> + * For a large transaction, if we don't send any change to the downstream for a
> + * long time then it can timeout. This can happen when all or most of the
> + * changes are either not published or got filtered out.
>
> We should probable mention that "long time" is wal_receiver_timeout on
> subscriber.
Improve as suggested.
Add "(exceeds the wal_receiver_timeout of standby)" to explain what "long time"
means.

> + * change as that can have overhead. Testing reveals that there is no
> + * noticeable overhead in doing it after continuously processing 100 or so
> + * changes.
>
> Tests revealed that ...
Improve as suggested.

> + * We don't have a mechanism to get the ack for any LSN other than end xact
> + * lsn from the downstream. So, we track lag only for end xact lsn's.
>
> s/lsn/LSN/ and s/lsn's/LSNs/
>
> I would say "end of transaction LSN".
Improve as suggested.

> + * If too many changes are processed then try to send a keepalive message to
> + * receiver to avoid timeouts.
>
> In logical replication, if too many changes are processed then try to send a
> keepalive message. It might avoid a timeout in the subscriber.
Improve as suggested.

Kindly have a look at new patch shared in [1].

[1] - https://www.postgresql.org/message-id/OS3PR01MB627561344A2C7ECF68E41D7E9EF39%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-04-18 06:25:49 RE: pg_get_publication_tables() output duplicate relid
Previous Message wangw.fnst@fujitsu.com 2022-04-18 06:16:40 RE: Logical replication timeout problem