Re: Resetting spilled txn statistics in pg_stat_replication

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Resetting spilled txn statistics in pg_stat_replication
Date: 2020-07-16 08:15:01
Message-ID: CA+fd4k4avmK06Z_edA6SPSDiThdOfOaRrdfkdaWaWLm1-2xHQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 16 Jul 2020 at 15:53, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jul 10, 2020 at 2:42 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Jul 10, 2020 at 7:23 AM Masahiko Sawada
> > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > >
> > > On Thu, 9 Jul 2020 at 12:11, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada
> > > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > > >
> > > > >
> > > > > I think that using oids has another benefit that we don't need to send
> > > > > slot name to the stats collector along with the stats. Since the
> > > > > maximum size of slot name is NAMEDATALEN and we don't support the
> > > > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the
> > > > > user wants to increase NAMEDATALEN they might not be able to build.
> > > > >
> > > >
> > > > I think NAMEDATALEN is used for many other objects as well and I don't
> > > > think we want to change it in foreseeable future, so that doesn't
> > > > sound to be a good reason to invent OIDs for slots. OTOH, I do
> > > > understand it would be better to send OIDs than names for slots but I
> > > > am just not sure if it is a good idea to invent a new way to generate
> > > > OIDs (which is different from how we do it for other objects in the
> > > > system) for this purpose.
> > >
> > > I'm concerned that there might be users who are using custom
> > > PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I
> > > also agree with your concerns. So perhaps we can go with the current
> > > PoC patch approach as the first version (i.g., sending slot drop
> > > message to stats collector). When we need such a unique identifier
> > > also for other purposes, we will be able to change this feature so
> > > that it uses that identifier for this statistics reporting purpose.
> > >
> >
> > Okay, feel to submit the version atop my revert patch.
> >
>
> Attached, please find the rebased version. I have kept prorows as 10
> instead of 100 for pg_stat_get_replication_slots because I don't see
> much reason for keeping the value more than the default value of
> max_replication_slots.
>

Thank you for rebasing the patch! Agreed.

> As we are targeting this patch for PG14, so I think we can now add the
> functionality to reset the stats as well. What do you think?
>

Yeah, I was also updating the patch while adding the reset functions.

However, I'm concerned about the following part:

+static int
+pgstat_replslot_index(const char *name)
+{
+ int i;
+
+ Assert(nReplSlotStats <= max_replication_slots);
+ for (i = 0; i < nReplSlotStats; i++)
+ {
+ if (strcmp(replSlotStats[i].slotname, name) == 0)
+ return i; /* found */
+ }
+
+ /* not found, register new slot */
+ memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
+ memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+ return nReplSlotStats++;
+}

+static void
+pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
+{
+ int idx;
+
+ idx = pgstat_replslot_index(msg->m_slotname);
+ Assert(idx >= 0 && idx < max_replication_slots);

As long as we cannot rely on message ordering, the above assertion
could be false. For example, suppose that there is no unused
replication slots and the user:

1. drops the existing slot.
2. creates a new slot.

If the stats messages arrive in order of 2 and 1, the above assertion
is false or leads to memory corruption when assertions are not
enabled.

A possible solution would be to add an in-use flag to
PgStat_ReplSlotStats indicating whether the stats for slot is used or
not. When receiving a drop message for a slot, the stats collector
just marks the corresponding stats as unused. When receiving the stats
report for a new slot but there is no unused stats slot, ignore it.
What do you think?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-07-16 08:58:58 Re: OpenSSL 3.0.0 compatibility
Previous Message Amit Kapila 2020-07-16 06:53:35 Re: Resetting spilled txn statistics in pg_stat_replication