Re: Resetting spilled txn statistics in pg_stat_replication

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Resetting spilled txn statistics in pg_stat_replication
Date: 2020-07-07 08:50:26
Message-ID: CABUevEz7jXQLSAzgFAb2AGBNJhL=ishkJOsfEFghpOy_12kRhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > On Mon, 6 Jul 2020 at 20:45, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > > Second, as long as the unique identifier is the slot name there is no
> > > > convenient way to distinguish between the same name old and new
> > > > replication slots, so the backend process or wal sender process sends
> > > > a message to the stats collector to drop the replication slot at
> > > > ReplicationSlotDropPtr(). This strategy differs from what we do for
> > > > table, index, and function statistics. It might not be a problem but
> > > > I’m thinking a better way.
> > > >
> > >
> > > Can we rely on message ordering in the transmission mechanism (UDP)
> > > for stats? The wiki suggests [1] we can't. If so, then this might
> > > not work.
> >
> > Yeah, I'm also concerned about this. Another idea would be to have
> > another unique identifier to distinguish old and new replication slots
> > with the same name. For example, creation timestamp. And then we
> > reclaim the stats of unused slots later like table and function
> > statistics.
> >
>
> So, we need to have 'creation timestamp' as persistent data for slots
> to achieve this? I am not sure of adding creation_time as a parameter
> to identify for this case because users can change timings on systems
> so it might not be a bullet-proof method but I agree that it can work
> in general.
>

If we need them to be persistent across time like that, perhaps we simply
need to assign oids to replication slots? That might simplify this problem
quite a bit?

> On the other hand, if the ordering were to be reversed, we would miss
> > that stats but the next stat reporting would create the new entry. If
> > the problem is unlikely to happen in common case we can live with
> > that.
> >
>
> Yeah, that is a valid point and I think otherwise also some UDP
> packets can be lost so maybe we don't need to worry too much about
> this. I guess we can add a comment in the code for such a case.
>

The fact that we may in theory lose some packages over UDP is the main
reason we're using UDP in the first place, I believe :) But it's highly
unlikely to happen in the real world I believe (and I think on some
platforms impossible).

> > > Aside from the above, this patch will change the most of the changes
> > > > introduced by commit 9290ad198b1 and introduce new code much. I’m
> > > > concerned whether such changes are acceptable at the time of beta 2.
> > > >
> > >
> > > I think it depends on the final patch. My initial thought was that we
> > > should do this for PG14 but if you are suggesting removing the changes
> > > done by commit 9290ad198b1 then we need to think over it. I could
> > > think of below options:
> > > a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We
> > > were anyway having spilling without any work in PG13 but didn’t have
> > > stats.
> > > b. Try to get your patch in PG13 if we can, otherwise, revert the
> > > feature 9290ad198b1.
> > > c. Get whatever we have in commit 9290ad198b1 for PG13 and
> > > additionally have what we are discussing here for PG14. This means
> > > that spilled stats at slot level will be available in PG14 via
> > > pg_stat_replication_slots and for individual WAL senders it will be
> > > available via pg_stat_replication both in PG13 and PG14. Even if we
> > > can get your patch in PG13, we can still keep those in
> > > pg_stat_replication.
> > > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it
> > > for PG14. I don't think this will be a popular approach.
> >
> > I was thinking option (a) or (b). I'm inclined to option (a) since the
> > PoC patch added a certain amount of new codes. I agree with you that
> > it depends on the final patch.
> >
>
> Magnus, Tomas, others, do you have any suggestions on the above
> options or let us know if you have any other option in mind?
>
>
I have a feeling it's far too late for (b) at this time. Regardless of the
size of the patch, it feels that this can end up being a rushed and not
thought-through-all-the-way one, in which case we may end up in an even
worse position.

Much as I would like to have these stats earlier, I'm also
leaning towards (a).

//Magnus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message matsumura.ryo@fujitsu.com 2020-07-07 09:02:56 RE: archive status ".ready" files may be created too early
Previous Message Amit Langote 2020-07-07 08:16:12 Re: [PATCH] Performance Improvement For Copy From Binary Files