Re: Logical replication timeout problem

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Peter Smith" <smithpb2250(at)gmail(dot)com>, "Fabrice Chapuis" <fabrice636861(at)gmail(dot)com>, "Simon Riggs" <simon(dot)riggs(at)enterprisedb(dot)com>, "Petr Jelinek" <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Ajin Cherian" <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2022-04-14 12:21:27
Message-ID: ef68bb18-37f9-4303-998e-a15d2fdb2563@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
> On Mon, Apr 11, 2022 at 12:09 PM wangw(dot)fnst(at)fujitsu(dot)com
> <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > So I skip tracking lag during a transaction just like the current HEAD.
> > Attach the new patch.
> >
>
> Thanks, please find the updated patch where I have slightly modified
> the comments.
>
> Sawada-San, Euler, do you have any opinion on this approach? I
> personally still prefer the approach implemented in v10 [1] especially
> due to the latest finding by Wang-San that we can't update the
> lag-tracker apart from when it is invoked at the transaction end.
> However, I am fine if we like this approach more.
It seems v15 is simpler and less error prone than v10. v10 has a mix of
OutputPluginUpdateProgress() and the new function update_progress(). The v10
also calls update_progress() for every change action in pgoutput_change(). It
is not a good approach for maintainability -- new changes like sequences need
extra calls. However, as you mentioned there should handle the track lag case.

Both patches change the OutputPluginUpdateProgress() so it cannot be
backpatched. Are you planning to backpatch it? If so, the boolean variable
(last_write or end_xacts depending of which version you are considering) could
be added to LogicalDecodingContext. (You should probably consider this approach
for skipped_xact too)

+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time then it can timeout. This can happen when all or most of the
+ * changes are either not published or got filtered out.

We should probable mention that "long time" is wal_receiver_timeout on
subscriber.

+ * change as that can have overhead. Testing reveals that there is no
+ * noticeable overhead in doing it after continuously processing 100 or so
+ * changes.

Tests revealed that ...

+ * We don't have a mechanism to get the ack for any LSN other than end xact
+ * lsn from the downstream. So, we track lag only for end xact lsn's.

s/lsn/LSN/ and s/lsn's/LSNs/

I would say "end of transaction LSN".

+ * If too many changes are processed then try to send a keepalive message to
+ * receiver to avoid timeouts.

In logical replication, if too many changes are processed then try to send a
keepalive message. It might avoid a timeout in the subscriber.

Does this same issue occur for long transactions? I mean keep a long
transaction open and execute thousands of transactions.

BEGIN;
INSERT INTO foo (a) VALUES(1);
-- wait a few hours while executing 10^x transactions
INSERT INTO foo (a) VALUES(2);
COMMIT;

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Frédéric Yhuel 2022-04-14 12:25:13 Re: Allow parallel plan for referential integrity checks?
Previous Message Masahiko Sawada 2022-04-14 12:20:13 Re: Logical replication timeout problem