Re: Logical replication timeout problem

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2022-05-02 02:36:32
Message-ID: CAD21AoB4km=SgT0DAJOoKYicVNqvWZMtuQ7D9d0RFq20NK_EEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 2, 2022 at 11:32 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, May 2, 2022 at 7:33 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > On Thu, Apr 28, 2022 at 7:01 PM houzj(dot)fnst(at)fujitsu(dot)com
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > Hi Sawada-san, Wang
> > >
> > > I was looking at the patch and noticed that we moved some logic from
> > > update_replication_progress() to OutputPluginUpdateProgress() like
> > > your suggestion.
> > >
> > > I have a question about this change. In the patch we added some
> > > restriction in this function OutputPluginUpdateProgress() like below:
> > >
> > > + /*
> > > + * If we are at the end of transaction LSN, update progress tracking.
> > > + * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we
> > > + * try to send a keepalive message if required.
> > > + */
> > > + if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD)
> > > + {
> > > + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid,
> > > + skipped_xact);
> > > + changes_count = 0;
> > > + }
> > >
> > > After the patch, we won't be able to always invoke the update_progress() if the
> > > caller are at the middle of transaction(e.g. end_xact = false). The bebavior of the
> > > public function OutputPluginUpdateProgress() is changed. I am thinking is it ok to
> > > change this at back-branches ?
> > >
> > > Because OutputPluginUpdateProgress() is a public function for the extension
> > > developer, just a little concerned about the behavior change here.
> >
> > Good point.
> >
> > As you pointed out, it would not be good if we change the behavior of
> > OutputPluginUpdateProgress() in back branches. Also, after more
> > thought, it is not a good idea even for HEAD since there might be
> > background workers that use logical decoding and the timeout checking
> > might not be relevant at all with them.
> >
>
> So, shall we go back to the previous approach of using a separate
> function update_replication_progress?

Ok, agreed.

>
> > BTW, I think you're concerned about the plugins that call
> > OutputPluginUpdateProgress() at the middle of the transaction (i.e.,
> > end_xact = false). We have the following change that makes the
> > walsender not update the progress at the middle of the transaction. Do
> > you think it is okay since it's not common usage to call
> > OutputPluginUpdateProgress() at the middle of the transaction by the
> > plugin that is used by the walsender?
> >
>
> We have done that purposefully as otherwise, the lag tracker shows
> incorrect information. See email [1]. The reason is that we always get
> ack from subscribers for transaction end. Also, prior to this patch we
> never call the lag tracker recording apart from the transaction end,
> so as a bug fix we shouldn't try to change it.

Make sense.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-05-02 03:30:12 Re: Add index scan progress to pg_stat_progress_vacuum
Previous Message Amit Kapila 2022-05-02 02:32:31 Re: Logical replication timeout problem