Re: Logical replication timeout problem

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2023-01-06 07:05:31
Message-ID: CAExHW5uALmiA4ZHSAdVwjsqiUCRzy+BjKr4RS_MpDomHuvJcLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Wang,
Thanks for working on this. One of our customer faced a similar
situation when running BDR with PostgreSQL.

I tested your patch and it solves the problem.

Please find some review comments below

On Tue, Nov 8, 2022 at 8:34 AM wangw(dot)fnst(at)fujitsu(dot)com
<wangw(dot)fnst(at)fujitsu(dot)com> wrote:
>
>
> Attach the patch.
>

+/*
+ * Helper function for ReorderBufferProcessTXN for updating progress.
+ */
+static inline void
+ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ ReorderBufferChange *change)
+{
+ LogicalDecodingContext *ctx = rb->private_data;
+ static int changes_count = 0;

It's not easy to know that a variable is static when reading the code which
uses it. So it's easy to interpret code wrong. I would probably track it
through logical decoding context itself OR through a global variable like other
places where we track the last timestamps. But there's more below on this.

+
+ if (!ctx->update_progress)
+ return;
+
+ Assert(!ctx->fast_forward);
+
+ /* set output state */
+ ctx->accept_writes = false;
+ ctx->write_xid = txn->xid;
+ ctx->write_location = change->lsn;
+ ctx->end_xact = false;

This patch reverts many of the changes of the previous commit which tried to
fix this issue i.e. 55558df2374. end_xact was introduced by the same commit but
without much explanation of that in the commit message. Its only user,
WalSndUpdateProgress(), is probably making a wrong assumption as well.

* We don't have a mechanism to get the ack for any LSN other than end
* xact LSN from the downstream. So, we track lag only for end of
* transaction LSN.

IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which is
sent downstream through a keep alive message. Downstream may acknowledge this
LSN. So we do get ack for any LSN, not just commit LSN.

So I propose removing end_xact as well.

+
+ /*
+ * We don't want to try sending a keepalive message after processing each
+ * change as that can have overhead. Tests revealed that there is no
+ * noticeable overhead in doing it after continuously processing 100 or so
+ * changes.
+ */
+#define CHANGES_THRESHOLD 100

I think a time based threashold makes more sense. What if the timeout was
nearing and those 100 changes just took little more time causing a timeout? We
already have a time based threashold in WalSndKeepaliveIfNecessary(). And that
function is invoked after reading every WAL record in WalSndLoop(). So it does
not look like it's an expensive function. If it is expensive we might want to
worry about WalSndLoop as well. Does it make more sense to remove this
threashold?

+
+ /*
+ * After continuously processing CHANGES_THRESHOLD changes, we
+ * try to send a keepalive message if required.
+ */
+ if (++changes_count >= CHANGES_THRESHOLD)
+ {
+ ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
+ changes_count = 0;
+ }
+}
+

On the other thread, I mentioned that we don't have a TAP test for it.
I agree with
Amit's opinion there that it's hard to create a test which will timeout
everywhere. I think what we need is a way to control the time required for
decoding a transaction.

A rough idea is to induce a small sleep after decoding every change. The amount
of sleep * number of changes will help us estimate and control the amount of
time taken to decode a transaction. Then we create a transaction which will
take longer than the timeout threashold to decode. But that's a
significant code. I
don't think PostgreSQL has a facility to induce a delay at a particular place
in the code.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-01-06 07:28:42 Add progress reporting to pg_verifybackup
Previous Message Jeff Davis 2023-01-06 07:04:32 Re: Rework of collation code, extensibility