Re: Logical replication timeout problem

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2023-01-31 11:42:37
Message-ID: CAA4eK1+CetnxzLQo26=G4pXUoU8wiZuPP2U-81PP4FOfTEtxDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 31, 2023 at 5:03 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Tue, Jan 31, 2023 at 4:58 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > Thanks, the patch looks good to me. I have slightly adjusted one of
> > the comments and ran pgindent. See attached. As mentioned in the
> > commit message, we shouldn't backpatch this as this requires a new
> > callback and moreover, users can increase the wal_sender_timeout and
> > wal_receiver_timeout to avoid this problem. What do you think?
>
> The callback and the implementation is all in core. What's the risk
> you see in backpatching it?
>

Because we are changing the exposed structure and which can break
existing extensions using it.

> Customers can adjust the timeouts, but only after the receiver has
> timed out a few times. Replication remains broekn till they notice it
> and adjust timeouts. By that time WAL has piled up. It also takes a
> few attempts to increase timeouts since the time taken by a
> transaction to decode can not be estimated beforehand. All that makes
> it worth back-patching if it's possible. We had a customer who piled
> up GBs of WAL before realising that this is the problem. Their system
> almost came to a halt due to that.
>

Which version are they using? If they are at >=14, using "streaming =
on" for a subscription should also avoid this problem.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-01-31 11:50:50 Re: Minimal logical decoding on standbys
Previous Message Thomas Munro 2023-01-31 11:38:33 Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"