Re: Logical replication timeout problem

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2022-03-29 05:07:17
Message-ID: CAD21AoAo6x3rAQ7VuzPT9paA4Y7uuWPqdQn_XYk1bWpCMF_N5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 25, 2022 at 5:33 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Mar 25, 2022 at 11:49 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Mar 25, 2022 at 2:23 PM wangw(dot)fnst(at)fujitsu(dot)com
> > <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Since commit 75b1521 added decoding of sequence to logical
> > replication, the patch needs to have pgoutput_sequence() call
> > update_progress().
> >
>
> Yeah, I also think this needs to be addressed. But apart from this, I
> want to know your and other's opinion on the following two points:
> a. Both this and the patch discussed in the nearby thread [1] add an
> additional parameter to
> WalSndUpdateProgress/OutputPluginUpdateProgress and it seems to me
> that both are required. The additional parameter 'last_write' added by
> this patch indicates: "If the last write is skipped then try (if we
> are close to wal_sender_timeout) to send a keepalive message to the
> receiver to avoid timeouts.". This means it can be used after any
> 'write' message. OTOH, the parameter 'skipped_xact' added by another
> patch [1] indicates if we have skipped sending anything for a
> transaction then sendkeepalive for synchronous replication to avoid
> any delays in such a transaction. Does this sound reasonable or can
> you think of a better way to deal with it?

These current approaches look good to me.

> b. Do we want to backpatch the patch in this thread? I am reluctant to
> backpatch because it changes the exposed API which can have an impact
> and second there exists a workaround (user can increase
> wal_sender_timeout/wal_receiver_timeout).

Yeah, we should avoid API changes between minor versions. I feel it's
better to fix it also for back-branches but probably we need another
fix for them. The issue reported on this thread seems quite
confusable; it looks like a network problem but is not true. Also, the
user who faced this issue has to increase wal_sender_timeout due to
the decoded data size, which also means to delay detecting network
problems. It seems an unrelated trade-off.

Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-03-29 05:13:00 Re: Skipping logical replication transactions on subscriber side
Previous Message Michael Paquier 2022-03-29 05:05:56 Re: Add pg_freespacemap extension sql test