Re: Replication slot stats misgivings

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replication slot stats misgivings
Date: 2021-05-06 00:45:12
Message-ID: CAD21AoAEtcEdR+J5nn6RpCA0aiHYzQPG-HynCThrEsVj+dYeig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 4, 2021 at 2:34 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Tue, May 4, 2021 at 9:48 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, May 3, 2021 at 10:21 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, May 3, 2021 at 5:48 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, May 3, 2021 at 2:27 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Apr 29, 2021 at 10:37 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Wed, Apr 28, 2021 at 7:43 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > On Wed, Apr 28, 2021 at 3:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > > > >
> > > > > > >
> > > > > > > > I am not sure if any of these alternatives are a good idea. What do
> > > > > > > > you think? Do you have any other ideas for this?
> > > > > > >
> > > > > > > I've been considering some ideas but don't come up with a good one
> > > > > > > yet. It’s just an idea and not tested but how about having
> > > > > > > CreateDecodingContext() register before_shmem_exit() callback with the
> > > > > > > decoding context to ensure that we send slot stats even on
> > > > > > > interruption. And FreeDecodingContext() cancels the callback.
> > > > > > >
> > > > > >
> > > > > > Is it a good idea to send stats while exiting and rely on the same? I
> > > > > > think before_shmem_exit is mostly used for the cleanup purpose so not
> > > > > > sure if we can rely on it for this purpose. I think we can't be sure
> > > > > > that in all cases we will send all stats, so maybe Vignesh's patch is
> > > > > > sufficient to cover the cases where we avoid losing it in cases where
> > > > > > we would have sent a large amount of data.
> > > > > >
> > > > >
> > > > > Sawada-San, any thoughts on this point?
> > > >
> > > > before_shmem_exit is mostly used to the cleanup purpose but how about
> > > > on_shmem_exit()? pgstats relies on that to send stats at the
> > > > interruption. See pgstat_shutdown_hook().
> > > >
> > >
> > > Yeah, that is worth trying. Would you like to give it a try?
> >
> > Yes.
> >
> > In this approach, I think we will need to have a static pointer in
> > logical.c pointing to LogicalDecodingContext that we’re using. At
> > StartupDecodingContext(), we set the pointer to the just created
> > LogicalDecodingContext and register the callback so that we can refer
> > to the LogicalDecodingContext on that callback. And at
> > FreeDecodingContext(), we reset the pointer to NULL (however, since
> > FreeDecodingContext() is not called when an error happens we would
> > need to ensure resetting it somehow). But, after more thought, if we
> > have the static pointer in logical.c it would rather be better to have
> > a global function that sends slot stats based on the
> > LogicalDecodingContext pointed by the static pointer and can be called
> > by ReplicationSlotRelease(). That way, we don’t need to worry about
> > erroring out cases as well as interruption cases, although we need to
> > have a new static pointer.
> >
> > I've attached a quick-hacked patch. I also incorporated the change
> > that calls UpdateDecodingStats() at FreeDecodingContext() so that we
> > can send slot stats also in the case where we spilled/streamed changes
> > but finished without commit/abort/prepare record.
> >
> > > I think
> > > it still might not cover the cases where we error out in the backend
> > > while decoding via APIs because at that time we won't exit, maybe for
> > > that we can consider Vignesh's patch.
> >
> > Agreed. It seems to me that the approach of the attached patch is
> > better than the approach using on_shmem_exit(). So if we want to avoid
> > having the new static pointer and function for this purpose we can
> > consider Vignesh’s patch.
> >
>
> I'm ok with using either my patch or Sawada san's patch, Even I had
> the same thought of whether we should have a static variable thought
> pointed out by Sawada san. Apart from that I had one minor comment:
> This comment needs to be corrected "andu sed to sent"
> +/*
> + * Pointing to the currently-used logical decoding context andu sed to sent
> + * slot statistics on releasing slots.
> + */
> +static LogicalDecodingContext *MyLogicalDecodingContext = NULL;
> +

Right, that needs to be fixed.

After more thought, I'm concerned that my patch's approach might be
invasive for PG14. Given that Vignesh’s patch would cover most cases,
I think we can live with a small downside that could miss some slot
stats. If we want to ensure sending slot stats at releasing slot, we
can develop it as an improvement. My patch would be better to get
reviewed by more peoples including the design during PG15 development.
Thoughts?

Regards,

---
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-05-06 01:07:12 Re: MaxOffsetNumber for Table AMs
Previous Message Robert Haas 2021-05-06 00:35:31 Re: Dubious assertion in RegisterDynamicBackgroundWorker