Re: Replication slot stats misgivings

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Replication slot stats misgivings
Date: 2021-03-20 03:55:40
Message-ID: CAA4eK1JzNY99Td3rQnpKnRgs8pk=v=+fddN5h6wz_BaiTK57HA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> And then more generally about the feature:
> - If a slot was used to stream out a large amount of changes (say an
> initial data load), but then replication is interrupted before the
> transaction is committed/aborted, stream_bytes will not reflect the
> many gigabytes of data we may have sent.
>

We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.

> - I seems weird that we went to the trouble of inventing replication
> slot stats, but then limit them to logical slots, and even there don't
> record the obvious things like the total amount of data sent.
>

Won't spill_bytes and stream_bytes will give you the amount of data sent?

>
> I think the best way to address the more fundamental "pgstat related"
> complaints is to change how replication slot stats are
> "addressed". Instead of using the slots name, report stats using the
> index in ReplicationSlotCtl->replication_slots.
>
> That removes the risk of running out of "replication slot stat slots":
> If we loose a drop message, the index eventually will be reused and we
> likely can detect that the stats were for a different slot by comparing
> the slot name.
>

This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that. What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?

> It also makes it easy to handle the issue of max_replication_slots being
> lowered and there still being stats for a slot - we simply can skip
> restoring that slots data, because we know the relevant slot can't exist
> anymore. And we can make the initial pgstat_report_replslot() during
> slot creation use a
>

Here, your last sentence seems to be incomplete.

> I'm wondering if we should just remove the slot name entirely from the
> pgstat.c side of things, and have pg_stat_get_replication_slots()
> inquire about slots by index as well and get the list of slots to report
> stats for from slot.c infrastructure.
>

But how will you detect in your idea that some of the stats from the
already dropped slot?

I'll create an entry for this in PG14 Open items wiki.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-03-20 04:06:51 Re: a verbose option for autovacuum
Previous Message Julien Rouhaud 2021-03-20 03:46:19 Re: pspg pager is finished