RE: Logical replication timeout problem

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
Subject: RE: Logical replication timeout problem
Date: 2022-02-28 07:42:34
Message-ID: OS3PR01MB6275D7FBDEBC143B792F93169E019@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thur, Feb 24, 2022 at 4:06 PM Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> Dear Wang,
Thanks for your review.

> > According to our discussion, we need to send keepalive messages to
> > subscriber when skipping changes.
> > One approach is that **for each skipped change**, we try to send
> > keepalive message by calculating whether a timeout will occur based on
> > the current time and the last time the keepalive was sent. But this will brings
> slight overhead.
> > So I want to try another approach: after **constantly skipping some
> > changes**, we try to send keepalive message by calculating whether a
> > timeout will occur based on the current time and the last time the keepalive
> was sent.
>
> You meant that calling system calls like GetCurrentTimestamp() should be
> reduced, right? I'm not sure how it affects but it seems reasonable.
Yes. There is no need to invoke frequently, and it will bring overhead.

> > IMO, we should send keepalive message after skipping a certain number
> > of changes constantly.
> > And I want to calculate the threshold dynamically by using a fixed
> > value to avoid adding too much code.
> > In addition, different users have different machine performance, and
> > users can modify wal_sender_timeout, so the threshold should be
> > dynamically calculated according to wal_sender_timeout.
>
> Your experiment seems not bad, but the background cannot be understand
> from code comments. I prefer a static threshold because it's more simple, which
> as Amit said in the following too:
>
> https://www.postgresql.org/message-id/CAA4eK1%2B-
> p_K_j%3DNiGGD6tCYXiJH0ypT4REX5PBKJ4AcUoF3gZQ%40mail.gmail.com
Yes, you are right. Fixed.
And I set the threshold to 10000.

> BTW, this patch cannot be applied to current master.
Thanks for reminder. Rebase it.
Kindly have a look at new patch shared in [1].

[1] https://www.postgresql.org/message-id/OS3PR01MB6275FEB9F83081F1C87539B99E019%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message tanghy.fnst@fujitsu.com 2022-02-28 07:45:37 RE: Design of pg_stat_subscription_workers vs pgstats
Previous Message wangw.fnst@fujitsu.com 2022-02-28 07:40:51 RE: Logical replication timeout problem