Re: Logical replication timeout problem

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2023-02-08 08:06:02
Message-ID: CAA4eK1+yuVhciTrUhjZy1oOQU_7HEq-Q8nysY6zAC8WsEpZxjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 8, 2023 at 10:57 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2023-02-03 10:13:54 +0530, Amit Kapila wrote:
> > I am planning to push this to HEAD sometime next week (by Wednesday).
> > To backpatch this, we need to fix it in some non-standard way, like
> > without introducing a callback which I am not sure is a good idea. If
> > some other committers vote to get this in back branches with that or
> > some different idea that can be backpatched then we can do that
> > separately as well. I don't see this as a must-fix in back branches
> > because we have a workaround (increase timeout) or users can use the
> > streaming option (for >=14).
>
> I just saw the commit go in, and a quick scan over it makes me think neither
> this commit, nor f95d53eded, which unfortunately was already backpatched, is
> the right direction. The wrong direction likely started quite a bit earlier,
> with 024711bb544.
>
> It feels quite fundamentally wrong that bascially every output plugin needs to
> call a special function in nearly every callback.
>
> In 024711bb544 there was just one call to OutputPluginUpdateProgress() in
> pgoutput.c. Quite tellingly, it just updated pgoutput, without touching
> test_decoding.
>
> Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another.
>

I think the original commit 024711bb544 forgets to call it in
test_decoding and the other commits followed the same and missed to
update test_decoding.

>
> This makes no sense. There's lots of output plugins out there. There's an
> increasing number of callbacks. This isn't a maintainable path forward.
>
>
> If we want to call something to maintain state, it has to be happening from
> central infrastructure.
>
>
> It feels quite odd architecturally that WalSndUpdateProgress() ends up
> flushing out writes - that's far far from obvious.
>
> I don't think:
> /*
> * Wait until there is no pending write. Also process replies from the other
> * side and check timeouts during that.
> */
> static void
> ProcessPendingWrites(void)
>
> Is really a good name. What are we processing?
>

It is for sending the keep_alive message (if required). That is
normally done when we skipped processing a transaction to ensure sync
replication is not delayed. It has been discussed previously [1][2] to
extend the WalSndUpdateProgress() interface. Basically, as explained
by Craig [2], this has to be done from plugin as it can do filtering
or there could be other reasons why the output plugin skips all
changes. We used the same interface for sending keep-alive message
when we processed a lot of (DDL) changes without sending anything to
plugin.

[1] - https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de
[2] - https://www.postgresql.org/message-id/CAMsr%2BYE3o8Dt890Q8wTooY2MpN0JvdHqUAHYL-LNhBryXOPaKg%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2023-02-08 08:07:27 Re: possible proposal plpgsql GET DIAGNOSTICS oid = PG_ROUTINE_OID
Previous Message Hayato Kuroda (Fujitsu) 2023-02-08 08:01:24 RE: Exit walsender before confirming remote flush in logical replication