Re: Logical replication timeout problem

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication timeout problem
Date: 2022-02-23 08:55:34
Message-ID: CAA4eK1+-p_K_j=NiGGD6tCYXiJH0ypT4REX5PBKJ4AcUoF3gZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 22, 2022 at 9:17 AM wangw(dot)fnst(at)fujitsu(dot)com
<wangw(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Fri, Feb 18, 2022 at 10:51 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > Some comments:
> Thanks for your review.
>
> > I see you only track skipped Inserts/Updates and Deletes. What about
> > DDL operations that are skipped, what about truncate.
> > What about changes made to unpublished tables? I wonder if you could
> > create a test script that only did DDL operations
> > and truncates, would this timeout happen?
> According to your suggestion, I tested with DDL and truncate.
> While testing, I ran only 20,000 DDLs and 10,000 truncations in one
> transaction.
> If I set wal_sender_timeout and wal_receiver_timeout to 30s, it will time out.
> And if I use the default values, it will not time out.
> IMHO there should not be long transactions that only contain DDL and
> truncation. I'm not quite sure, do we need to handle this kind of use case?
>

I think it is better to handle such cases as well and changes related
to unpublished tables as well. BTW, it seems Kuroda-San has also given
some comments [1] which I am not sure are addressed.

I think instead of keeping the skipping threshold w.r.t
wal_sender_timeout, we can use some conservative number like 10000 or
so which we are sure won't impact performance and won't lead to
timeouts.

*
+ /*
+ * skipped_changes_count is reset when processing changes that do not need to
+ * be skipped.
+ */
+ skipped_changes_count = 0

When the skipped_changes_count is reset, the sendTime should also be
reset. Can we reset it whenever the UpdateProgress function is called
with send_keep_alive as false?

[1] - https://www.postgresql.org/message-id/TYAPR01MB5866BD2248EF82FF432FE599F52D9%40TYAPR01MB5866.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2022-02-23 09:00:03 Re: List of all* PostgreSQL EXTENSIONs in the world
Previous Message Joel Jacobson 2022-02-23 08:52:18 Re: List of all* PostgreSQL EXTENSIONs in the world