Re: Logical replication timeout problem

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2022-03-08 07:52:27
Message-ID: CAD21AoDe0Cr1T4V6ReUP_Nua3MG++xO4wdiY4+TUDUoLGibA5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Mar 8, 2022 at 10:25 AM wangw(dot)fnst(at)fujitsu(dot)com
<wangw(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Fri, Mar 4, 2022 at 4:26 PM Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> Thanks for your test and comments.
>
> > Some codes were added in ReorderBufferProcessTXN() for treating DDL,
> > but I doubted updating accept_writes is needed.
> > IMU, the parameter is read by OutputPluginPrepareWrite() in order align
> > messages.
> > They should have a header - like 'w' - before their body. But here only a
> > keepalive message is sent,
> > no meaningful changes, so I think it might be not needed.
> > I commented out the line and tested like you did [1], and no timeout and errors
> > were found.
> > Do you have any reasons for that?
> >
> > https://www.postgresql.org/message-
> > id/OS3PR01MB6275A95FD44DC6C46058EA389E3B9%40OS3PR01MB6275.jpnprd0
> > 1.prod.outlook.com
> Yes, you are right. We should not set accept_writes to true here.
> And I looked into the function WalSndUpdateProgress. I found function
> WalSndUpdateProgress try to record the time of some message(by function
> LagTrackerWrite) sent to subscriber, such as in function pgoutput_commit_txn.
> Then, when publisher receives the reply message from the subscriber(function
> ProcessStandbyReplyMessage), publisher invokes LagTrackerRead to calculate the
> delay time(refer to view pg_stat_replication).
> Referring to the purpose of LagTrackerWrite, I think it is no need to log time
> when sending keepalive messages here.
> So when the parameter send_keep_alive of function WalSndUpdateProgress is true,
> skip the recording time.
>
> > I'm also happy if you give the version number :-).
> Introduce version information, starting from version 1.
>
> Attach the new patch.
> 1. Fix wrong variable setting and skip unnecessary time records.[suggestion by Kuroda-San and me.]
> 2. Introduce version information.[suggestion by Peter, Kuroda-San]

I've looked at the patch and have a question:

+void
+SendKeepaliveIfNecessary(LogicalDecodingContext *ctx, bool skipped)
+{
+ static int skipped_changes_count = 0;
+
+ /*
+ * skipped_changes_count is reset when processing changes that do not
+ * need to be skipped.
+ */
+ if (!skipped)
+ {
+ skipped_changes_count = 0;
+ return;
+ }
+
+ /*
+ * After continuously skipping SKIPPED_CHANGES_THRESHOLD
changes, try to send a
+ * keepalive message.
+ */
+ #define SKIPPED_CHANGES_THRESHOLD 10000
+
+ if (++skipped_changes_count >= SKIPPED_CHANGES_THRESHOLD)
+ {
+ /* Try to send a keepalive message. */
+ OutputPluginUpdateProgress(ctx, true);
+
+ /* After trying to send a keepalive message, reset the flag. */
+ skipped_changes_count = 0;
+ }
+}

Since we send a keepalive after continuously skipping 10000 changes,
the originally reported issue can still occur if skipping 10000
changes took more than the timeout and the walsender didn't send any
change while that, is that right?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-03-08 07:54:47 Re: Handle infinite recursion in logical replication setup
Previous Message Aleksander Alekseev 2022-03-08 07:49:43 Re: Add 64-bit XIDs into PostgreSQL 15