Re: Logical replication timeout problem

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2022-04-19 01:32:07
Message-ID: CAD21AoCLaC-Dj=dcz5hQqcxpzi7h_eDsV5uc2156LkrKa6mLQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 18, 2022 at 3:16 PM wangw(dot)fnst(at)fujitsu(dot)com
<wangw(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Mon, Apr 18, 2022 at 00:35 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > On Mon, Apr 18, 2022 at 1:01 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Apr 14, 2022 at 5:50 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> > wrote:
> > > >
> > > > On Wed, Apr 13, 2022 at 7:45 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > > > >
> > > > > On Mon, Apr 11, 2022 at 12:09 PM wangw(dot)fnst(at)fujitsu(dot)com
> > > > > <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > > >
> > > > > > So I skip tracking lag during a transaction just like the current HEAD.
> > > > > > Attach the new patch.
> > > > > >
> > > > >
> > > > > Thanks, please find the updated patch where I have slightly
> > > > > modified the comments.
> > > > >
> > > > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > > > personally still prefer the approach implemented in v10 [1]
> > > > > especially due to the latest finding by Wang-San that we can't
> > > > > update the lag-tracker apart from when it is invoked at the transaction end.
> > > > > However, I am fine if we like this approach more.
> > > >
> > > > Thank you for updating the patch.
> > > >
> > > > The current patch looks much better than v10 which requires to call
> > > > to
> > > > update_progress() every path.
> > > >
> > > > Regarding v15 patch, I'm concerned a bit that the new function name,
> > > > update_progress(), is too generic. How about
> > > > update_replation_progress() or something more specific name?
> > > >
> > >
> > > Do you intend to say update_replication_progress()? The word
> > > 'replation' doesn't make sense to me. I am fine with this suggestion.
> >
> > Yeah, that was a typo. I meant update_replication_progress().
> Thanks for your comments.
>
> > > > Regarding v15 patch, I'm concerned a bit that the new function name,
> > > > update_progress(), is too generic. How about
> > > > update_replation_progress() or something more specific name?
> Improve as suggested. Change the name from update_progress to
> update_replication_progress.
>
> > > > ---
> > > > + if (end_xact)
> > > > + {
> > > > + /* Update progress tracking at xact end. */
> > > > + OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
> > > > + changes_count = 0;
> > > > + return;
> > > > + }
> > > > +
> > > > + /*
> > > > + * After continuously processing CHANGES_THRESHOLD changes,
> > > > we try to send
> > > > + * a keepalive message if required.
> > > > + *
> > > > + * We don't want to try sending a keepalive message after
> > > > processing each
> > > > + * change as that can have overhead. Testing reveals that there is no
> > > > + * noticeable overhead in doing it after continuously
> > > > processing 100 or so
> > > > + * changes.
> > > > + */
> > > > +#define CHANGES_THRESHOLD 100
> > > > + if (++changes_count >= CHANGES_THRESHOLD)
> > > > + {
> > > > + OutputPluginUpdateProgress(ctx, skipped_xact, end_xact);
> > > > + changes_count = 0;
> > > > + }
> > > >
> > > > Can we merge two if branches since we do the same things? Or did you
> > > > separate them for better readability?
> Improve as suggested. Merge two if-branches.
>
> Attach the new patch.
> 1. Rename the new function(update_progress) to update_replication_progress. [suggestion by Sawada-San]
> 2. Merge two if-branches in new function update_replication_progress. [suggestion by Sawada-San.]
> 3. Improve comments to make them clear. [suggestions by Euler-San.]

Thank you for updating the patch.

+ * For a large transaction, if we don't send any change to the downstream for a
+ * long time(exceeds the wal_receiver_timeout of standby) then it can timeout.
+ * This can happen when all or most of the changes are either not published or
+ * got filtered out.

+ */
+ if(end_xact || ++changes_count >= CHANGES_THRESHOLD)
+ {

We need a whitespace before '(' at above two places. The rest looks good to me.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-04-19 01:45:15 Re: BufferAlloc: don't take two simultaneous locks
Previous Message Masahiko Sawada 2022-04-19 01:27:40 Re: Column Filtering in Logical Replication