Re: Resetting spilled txn statistics in pg_stat_replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Resetting spilled txn statistics in pg_stat_replication
Date: 2020-07-16 09:16:25
Message-ID: CAA4eK1K2xHOZziGpQKTO_8uFJo2NdwBZTYw3bJpEaBcJOXVWuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 16, 2020 at 1:45 PM Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Thu, 16 Jul 2020 at 15:53, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Jul 10, 2020 at 2:42 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jul 10, 2020 at 7:23 AM Masahiko Sawada
> > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > >
> > > > On Thu, 9 Jul 2020 at 12:11, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada
> > > > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > > > >
> > > > > >
> > > > > > I think that using oids has another benefit that we don't need to send
> > > > > > slot name to the stats collector along with the stats. Since the
> > > > > > maximum size of slot name is NAMEDATALEN and we don't support the
> > > > > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the
> > > > > > user wants to increase NAMEDATALEN they might not be able to build.
> > > > > >
> > > > >
> > > > > I think NAMEDATALEN is used for many other objects as well and I don't
> > > > > think we want to change it in foreseeable future, so that doesn't
> > > > > sound to be a good reason to invent OIDs for slots. OTOH, I do
> > > > > understand it would be better to send OIDs than names for slots but I
> > > > > am just not sure if it is a good idea to invent a new way to generate
> > > > > OIDs (which is different from how we do it for other objects in the
> > > > > system) for this purpose.
> > > >
> > > > I'm concerned that there might be users who are using custom
> > > > PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I
> > > > also agree with your concerns. So perhaps we can go with the current
> > > > PoC patch approach as the first version (i.g., sending slot drop
> > > > message to stats collector). When we need such a unique identifier
> > > > also for other purposes, we will be able to change this feature so
> > > > that it uses that identifier for this statistics reporting purpose.
> > > >
> > >
> > > Okay, feel to submit the version atop my revert patch.
> > >
> >
> > Attached, please find the rebased version. I have kept prorows as 10
> > instead of 100 for pg_stat_get_replication_slots because I don't see
> > much reason for keeping the value more than the default value of
> > max_replication_slots.
> >
>
> Thank you for rebasing the patch! Agreed.
>
> > As we are targeting this patch for PG14, so I think we can now add the
> > functionality to reset the stats as well. What do you think?
> >
>
> Yeah, I was also updating the patch while adding the reset functions.
>
> However, I'm concerned about the following part:
>
> +static int
> +pgstat_replslot_index(const char *name)
> +{
> + int i;
> +
> + Assert(nReplSlotStats <= max_replication_slots);
> + for (i = 0; i < nReplSlotStats; i++)
> + {
> + if (strcmp(replSlotStats[i].slotname, name) == 0)
> + return i; /* found */
> + }
> +
> + /* not found, register new slot */
> + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
> + memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
> + return nReplSlotStats++;
> +}
>
> +static void
> +pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
> +{
> + int idx;
> +
> + idx = pgstat_replslot_index(msg->m_slotname);
> + Assert(idx >= 0 && idx < max_replication_slots);
>
> As long as we cannot rely on message ordering, the above assertion
> could be false. For example, suppose that there is no unused
> replication slots and the user:
>
> 1. drops the existing slot.
> 2. creates a new slot.
>
> If the stats messages arrive in order of 2 and 1, the above assertion
> is false or leads to memory corruption when assertions are not
> enabled.
>
> A possible solution would be to add an in-use flag to
> PgStat_ReplSlotStats indicating whether the stats for slot is used or
> not. When receiving a drop message for a slot, the stats collector
> just marks the corresponding stats as unused. When receiving the stats
> report for a new slot but there is no unused stats slot, ignore it.
> What do you think?
>

As of now, you have a boolean flag msg.m_drop to distinguish the drop
message but we don't have a similar way to distinguish the 'create'
message. What if have a way to distinguish 'create' message (we can
probably keep some sort of flag to indicate the type of message
(create, drop, update)) and then if the slot with the same name
already exists, we ignore such a message. Now, we also need a way to
create the entry for a slot for a normal stats update message as well
to accommodate for the lost 'create' message. Does that make sense?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-07-16 09:32:17 Re: Which SET TYPE don't actually require a rewrite
Previous Message Amit Langote 2020-07-16 09:14:04 Re: [POC] Fast COPY FROM command for the table with foreign partitions