Re: Replication slot stats misgivings

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Replication slot stats misgivings
Date: 2021-03-20 21:26:53
Message-ID: 20210320212653.flbcpczmlmug5g5d@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-03-20 09:25:40 +0530, Amit Kapila wrote:
> On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > And then more generally about the feature:
> > - If a slot was used to stream out a large amount of changes (say an
> > initial data load), but then replication is interrupted before the
> > transaction is committed/aborted, stream_bytes will not reflect the
> > many gigabytes of data we may have sent.
> >
>
> We can probably update the stats each time we spilled or streamed the
> transaction data but it was not clear at that stage whether or how
> much it will be useful.

It seems like the obvious answer here is to sync stats when releasing
the slot?

> > - I seems weird that we went to the trouble of inventing replication
> > slot stats, but then limit them to logical slots, and even there don't
> > record the obvious things like the total amount of data sent.
> >
>
> Won't spill_bytes and stream_bytes will give you the amount of data sent?

I don't think either tracks changes that were neither spilled nor
streamed? And if they are, they're terribly misnamed?

> >
> > I think the best way to address the more fundamental "pgstat related"
> > complaints is to change how replication slot stats are
> > "addressed". Instead of using the slots name, report stats using the
> > index in ReplicationSlotCtl->replication_slots.
> >
> > That removes the risk of running out of "replication slot stat slots":
> > If we loose a drop message, the index eventually will be reused and we
> > likely can detect that the stats were for a different slot by comparing
> > the slot name.
> >
>
> This idea is worth exploring to address the complaints but what do we
> do when we detect that the stats are from the different slot?

I think it's pretty easy to make that bulletproof. Add a
pgstat_report_replslot_create(), and use that in
ReplicationSlotCreate(). That is called with
ReplicationSlotAllocationLock held, so it can just safely zero out stats.

I don't think:

> It has mixed of stats from the old and new slot.

Can happen in that scenario.

> > It also makes it easy to handle the issue of max_replication_slots being
> > lowered and there still being stats for a slot - we simply can skip
> > restoring that slots data, because we know the relevant slot can't exist
> > anymore. And we can make the initial pgstat_report_replslot() during
> > slot creation use a
> >
>
> Here, your last sentence seems to be incomplete.

Oops, I was planning to suggest adding pgstat_report_replslot_create()
that zeroes out the pre-existing stats (or a parameter to
pgstat_report_replslot(), but I don't think that's better).

> > I'm wondering if we should just remove the slot name entirely from the
> > pgstat.c side of things, and have pg_stat_get_replication_slots()
> > inquire about slots by index as well and get the list of slots to report
> > stats for from slot.c infrastructure.
> >
>
> But how will you detect in your idea that some of the stats from the
> already dropped slot?

I don't think that is possible with my sketch?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-03-20 21:27:44 Re: Replication slot stats misgivings
Previous Message Justin Pryzby 2021-03-20 21:23:02 Re: [HACKERS] Custom compression methods (mac+lz4.h)