RE: Logical replication timeout problem

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: RE: Logical replication timeout problem
Date: 2022-04-11 06:38:59
Message-ID: OS3PR01MB62755D216245199554DDC8DB9EEA9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 7, 2022 at 1:34 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Apr 6, 2022 at 6:30 PM wangw(dot)fnst(at)fujitsu(dot)com
> <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Wed, Apr 6, 2022 at 1:58 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > Also, let's try to evaluate how it impacts lag functionality for large
> transactions?
> > I think this patch will not affect lag functionality. It will updates the lag
> > field of view pg_stat_replication more frequently.
> > IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of
> > change and invoking time in lag_tracker. Then when invoking function
> > ProcessStandbyReplyMessage, it will calculate the lag field according to the
> > message from subscriber and the information in lag_tracker. This patch does
> > not modify this logic, but only increases the frequency of invoking.
> > Please let me know if I understand wrong.
> >
>
> No, your understanding seems correct to me. But what I want to check
> is if calling the progress function more often has any impact on
> lag-related fields in pg_stat_replication? I think you need to check
> the impact of large transaction replay.
Thanks for the explanation.

After doing some checks, I found that the v13 patch makes the calculations of
lag functionality inaccurate.

In short, v13 patch lets us try to track lag more frequently and try to send a
keepalive message to subscribers. But in order to prevent flooding the lag
tracker, we could not track lag more than once within
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
This means we may lose informations that needs to be tracked.
For example, suppose there is a large transaction with lsn from lsn1 to lsn3.
In HEAD, when we calculate the lag time for lsn3, the lag time of lsn3 is
(now - lsn3.time).
But with v13 patch, when we calculate the lag time for lsn3, because there
maybe no informations of lsn3 but has informations of lsn2 in lag_tracker, the
lag time of lsn3 is (now - t2.time). (see function LagTrackerRead)
Therefore, if we lose the informations that need to be tracked, the lag time
becomes large and inaccurate.

So I skip tracking lag during a transaction just like the current HEAD.
Attach the new patch.

Regards,
Wang wei

Attachment Content-Type Size
v14-0001-Fix-the-logical-replication-timeout-during-large.patch application/octet-stream 10.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Qiongwen Liu 2022-04-11 06:43:29 PostgreSQL Program Application
Previous Message Filip Janus 2022-04-11 06:36:17 Re: Practical Timing Side Channel Attacks on Memory Compression