RE: Logical replication timeout problem

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: RE: Logical replication timeout problem
Date: 2023-01-09 10:38:06
Message-ID: OS3PR01MB62753A7728A688B7349D92209EFE9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> Hi Wang,
> Thanks for working on this. One of our customer faced a similar
> situation when running BDR with PostgreSQL.
>
> I tested your patch and it solves the problem.
>
> Please find some review comments below

Thanks for your testing and comments.

> +/*
> + * Helper function for ReorderBufferProcessTXN for updating progress.
> + */
> +static inline void
> +ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn,
> + ReorderBufferChange *change)
> +{
> + LogicalDecodingContext *ctx = rb->private_data;
> + static int changes_count = 0;
>
> It's not easy to know that a variable is static when reading the code which
> uses it. So it's easy to interpret code wrong. I would probably track it
> through logical decoding context itself OR through a global variable like other
> places where we track the last timestamps. But there's more below on this.

I'm not sure if we need to add global variables or member variables for a
cumulative count that is only used here. How would you feel if I add some
comments when declaring this static variable?

> +
> + if (!ctx->update_progress)
> + return;
> +
> + Assert(!ctx->fast_forward);
> +
> + /* set output state */
> + ctx->accept_writes = false;
> + ctx->write_xid = txn->xid;
> + ctx->write_location = change->lsn;
> + ctx->end_xact = false;
>
> This patch reverts many of the changes of the previous commit which tried to
> fix this issue i.e. 55558df2374. end_xact was introduced by the same commit but
> without much explanation of that in the commit message. Its only user,
> WalSndUpdateProgress(), is probably making a wrong assumption as well.
>
> * We don't have a mechanism to get the ack for any LSN other than end
> * xact LSN from the downstream. So, we track lag only for end of
> * transaction LSN.
>
> IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which is
> sent downstream through a keep alive message. Downstream may
> acknowledge this
> LSN. So we do get ack for any LSN, not just commit LSN.
>
> So I propose removing end_xact as well.

We didn't track the lag during a transaction because it could make the
calculations of lag functionality inaccurate. If we track every lsn, it could
fail to record important lsn information because of
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
Please see details in [1] and [2].

Regards,
Wang Wei

[1] - https://www.postgresql.org/message-id/OS3PR01MB62755D216245199554DDC8DB9EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/OS3PR01MB627514AE0B3040D8F55A68B99EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2023-01-09 10:40:51 Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Previous Message Alexander Korotkov 2023-01-09 10:29:18 Re: POC: Lock updated tuples in tuple_update() and tuple_delete()