Re: Add session statistics to pg_stat_database

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Ahsan Hadi <ahsan(dot)hadi(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add session statistics to pg_stat_database
Date: 2020-12-03 12:22:35
Message-ID: 27530e647d14ef49199694ea3c376935fa7285e8.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2020-12-01 at 17:32 +0100, Magnus Hagander wrote:
> > I have changed "connections" to "sessions" and renamed the new
> > column "connections" to "session_count".
> >
> > I think that most people will understand a session as started after a successful
> > connection.
>
> Yeah, I agree, and as long as it's consistent we don't need more explanations than that.
>
> Further int he views, it's a bit strange to have session_count and aborted_session, but I'm not
> sure what to suggest. "aborted_session_count" seems too long. Maybe just "sessions" instead
> of "session_count" -- no other counters actually have the "_count" suffix.

"sessions" is fine, I think; I changed the name.

> > > I wonder if there would also be a way to count "sessions that crashed" as well.
> > > That is,the ones that failed in a way that caused the postmaster to restart the system.
> >
> > Sure, a crash count would be useful. I don't know if it is easy for the stats collector
> > to tell the difference between a start after a backend crash and - say - starting from
> > a base backup.
> >
> > I think that that would be material for another patch, and I don't think it should go
> > to "pg_stat_database", because a) it might be hard to tell to which database the crashed
> > backend was attached, b) it might be a background process that doesn't belong to a database
> > and c) if the crash were caused by - say - corruption in a shared catalog, it would be
> > misleading
>
> I'm not sure it is outside the scope of this patch, because I think it might be easier to
> do than I (and I think you) first thought. We don't need to track which database crashed --
> if we track all *other* ways a database exits, then crashes are all that remains.
>
> So in fact, we *almost* have all the data we need already. We have the number of sessions
> started. We have the number of sessions "aborted". if we also had the number of sessions
> that were closed normally, then whatever is "left" would be the number of sessions crashed.
> And we do already, in your patch, send the message in the case of both aborted and
> non-aborted sessions. So we just need to keep track of both in the statsfile
> (which we don't now), and we'd more or less have it, wouldn't we?

There is one problem with that: the statistics collector is not guaranteed to get all
messages, right? If a disconnection statistics UDP datagram doesn't reach the statistics
collector, that connection
would end up being reported as crashed.
That would alarm people unnecessarily and make the crash statistics misleading.

> However, some thinking around that also leads me to another question which is very much
> in scope for this patch regardless, which is what about shutdown and admin termination.
> Right now, when you do a "pg_ctl stop" on the database, all sessions count as aborted.
> Same thing for a pg_terminate_backend(). I wonder if this is also a case that would be
> useful to track as a separate thing? One could argue that the docs in your patch say
> aborted means "terminated by something else than a regular client disconnection".
> But that's true for a "shutdown", but not for a crash, so whichever way we go with crashes
> it's slightly incorrect.

> But thinking from a usability perspective, wouldn't what we want more be something
> like <closed by correct disconnect>, <closed by abnormal disconnect>, <closed by admin>,
> <crash>?
>
> What do you think of adapting it to that?
>
> Basically, that would change pgStatSessionDisconnectedNormally into instead being an
> enum of reasons, which could be normal disconnect, abnormal disconnect and admin.
> And we'd track all those three as separate numbers in the stats file, meaning we could
> then calculate the crash by subtracting all three from the total number of sessions?

I think at least "closed by admin" might be interesting; I'll have a look.
I don't think we have to specifically count "closed by normal disconnect", because
that should be the rule and could be more or less deduced from the other numbers
(with the uncertainty mentioned above).

> (Let me know if you think the idea could work and would prefer it if I worked up a
> complete suggestion based on it rather than just spitting ideas)

Thanks for the offer, and I'll get back to it if I get stuck.
But I'm ready to do the grunt work, so that you can spend your precious
committer cycles elsewhere :^)

I'll have a go at "closed by admin", meanwhile here is patch v7 with the renaming
"session_count -> sessions".

Yours,
Laurenz Albe

Attachment Content-Type Size
0001-Add-session-statistics-to-pg_stat_database.v7.patch text/x-patch 17.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-12-03 12:29:08 Re: Commitfest 2020-11 is closed
Previous Message Peter Eisentraut 2020-12-03 12:16:32 Re: SELECT INTO deprecation