Re: Resetting spilled txn statistics in pg_stat_replication

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Resetting spilled txn statistics in pg_stat_replication
Date: 2020-06-26 09:08:08
Message-ID: CABUevEzYHwsTu-19Kz81YDg2W6jVOZ4AYt4GzD6D_1awdN4-SA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 23, 2020 at 12:18 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
> >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
> ><masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >>
> >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <
> tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >> >
> >> > >
> >> > >What if the decoding has been performed by multiple backends using
> the
> >> > >same slot? In that case, it will be difficult to make the judgment
> >> > >for the value of logical_decoding_work_mem based on stats. It would
> >> > >make sense if we provide a way to set logical_decoding_work_mem for a
> >> > >slot but not sure if that is better than what we have now.
> >> > >
> >>
> >> I thought that the stats are relevant to what
> >> logical_decoding_work_mem value was but not with who performed logical
> >> decoding. So even if multiple backends perform logical decoding using
> >> the same slot, the user can directly use stats as long as
> >> logical_decoding_work_mem value doesn’t change.
> >>
> >
> >I think if you maintain these stats at the slot level, you probably
> >need to use spinlock or atomic ops in order to update those as slots
> >can be used from multiple backends whereas currently, we don't need
> >that.
>
> IMHO storing the stats in the slot itself is a bad idea. We have the
> statistics collector for exactly this purpose, and it's receiving data
> over UDP without any extra locking etc.
>

Yeah, that seems much more appropriate. Of course, where they are exposed
is a different question.

>> > >What problems do we see in displaying these for each process? I think
> >> > >users might want to see the stats for the exited processes or after
> >> > >server restart but I think both of those are not even possible today.
> >> > >I think the stats are available till the corresponding WALSender
> >> > >process is active.
> >>
> >> I might want to see the stats for the exited processes or after server
> >> restart. But I'm inclined to agree with displaying the stats per
> >> process if the stats are displayed on a separate view (e.g.
> >> pg_stat_replication_slots).
> >>
> >
> >Yeah, as told previously, this makes more sense to me.
> >
> >Do you think we should try to write a POC patch using a per-process
> >entry approach and see what difficulties we are facing and does it
> >give the stats in a way we are imagining but OTOH, we can wait for
> >some more to see if there is clear winner approach here?
> >
>
> I may be missing something obvious, but I still see no point in tracking
> per-process stats. We don't have that for other stats, and I'm not sure
> how common is the scenario when a given slot is decoded by many
> backends. I'd say vast majority of cases are simply running decoding
> from a walsender, which may occasionally restart, but I doubt the users
> are interested in per-pid data - they probably want aggregated data.
>

Well, technically we do -- we have the pg_stat_xact_* views. However, those
are only viewable from *inside* the session itself (which can sometimes be
quite annoying).

This does somewhat apply in that normal transactions send their stats
batches at transaction end. If this is data we'd be interested in viewing
inside of that, a more direct exposure would be needed -- such as the way
we do with LSNs in pg_stat_replication or whatever.

For long-term monitoring, people definitely want aggregate data I'd say.
The "realtime data" if we call it that is in my experience mostly
interesting if you want to define alerts etc ("replication standby is too
far behind" is alertable through that, whereas things like "total amount of
replication traffic over the past hour" is something that's more
trend-alertable which is typically handled in a separate system pulling the
aggregate stats)

Can someone explain a plausible scenario for which tracking per-process
> stats would be needed, and simply computing deltas would not work? How
> will you know which old PID is which, what will you do when a PID is
> reused, and so on?
>

I fail to see that one as well, in a real-world scenario. Maybe if you want
to do a one-off point-tuning of one tiny piece of a system? But you will
then also need to long term statistics to follow-up if what you did was
correct anyway...

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-06-26 09:09:22 Re: Remove a redundant condition check
Previous Message Bharath Rupireddy 2020-06-26 09:04:23 Re: Parallel copy