Re: shared memory stats: high level design decisions: consistency, dropping

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: shared memory stats: high level design decisions: consistency, dropping
Date: 2021-03-25 22:52:58
Message-ID: CABUevExXEh2yzsSZN1gXJK7puZ5K6AQ4rFirrMcqBMJr67+Anw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 25, 2021 at 6:20 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2021-03-24 14:42:11 +0100, Magnus Hagander wrote:
> > On Sun, Mar 21, 2021 at 11:34 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > If I understand what you are proposing, all stats views would become
> > > > completely volatile, without even within-query consistency. That really
> > > > is not gonna work. As an example, you could get not-even-self-consistent
> > > > results from a join to a stats view if the planner decides to implement
> > > > it as a nestloop with the view on the inside.
> > >
> > > I don't really think it's a problem that's worth incuring that much cost to
> > > prevent. We already have that behaviour for a number of of the pg_stat_* views,
> > > e.g. pg_stat_xact_all_tables, pg_stat_replication.
> >
> > Aren't those both pretty bad examples though?
> >
> > pg_stat_xact_all_tables surely is within-query consistent, and would
> > be pretty useless if it wwas within-transaction consistent?
>
> It's not within-query consistent:
>
> postgres[1182102][1]=# SELECT pg_stat_get_xact_numscans('pg_class'::regclass) UNION ALL SELECT count(*) FROM pg_class UNION ALL SELECT pg_stat_get_xact_numscans('pg_class'::regclass);
> ┌───────────────────────────┐
> │ pg_stat_get_xact_numscans │
> ├───────────────────────────┤
> │ 0 │
> │ 397 │
> │ 1 │
> └───────────────────────────┘
> (3 rows)

Heh. OK, I admit I didn't consider a UNION query like that -- I only
considered it being present *once* in a query :)

That said, if wanted that can be dealt with a WITH MATERIALIZED as
long as it's in the same query, no?

> > pg_stat_replication is a snapshot of what things are right now (like
> > pg_stat_activity), and not collected statistics.
>
> However, pg_stat_activity does have snapshot semantics...

Yeah, yay consistency.

> > Maybe there's inconsistency in that they should've had a different
> > name to separate it out, but fundamentally having xact consistent
> > views there would be a bad thing, no?
>
> True. One weird thing around the _xact_ versions that we, at best,
> *hint* at in the docs, but also contradict, is that _xact_ views are
> actually not tied to the transaction. It's really about unsubmitted
> stats. E.g. if executing the following via copy-paste
>
> SELECT count(*) FROM pg_class;
> SELECT pg_stat_get_xact_numscans('pg_class'::regclass);
> SELECT count(*) FROM pg_class;
> SELECT pg_stat_get_xact_numscans('pg_class'::regclass);
>
> will most of the time return
> <count>
> 0
> <count>
> 1
>
> because after the transaction for the first count(*) commits, we'll not
> have submitted stats for more than PGSTAT_STAT_INTERVAL. But after the
> second count(*) it'll be shorter, therefore the stats won't be
> submitted...

That's... cute. I hadn't realized that part, but then I've never
actually had use for the _xact_ views.

> > > There's just a huge difference between being able to access a table's stats in
> > > O(1) time, or having a single stats access be O(database-objects).
> > >
> > > And that includes accesses to things like pg_stat_bgwriter, pg_stat_database
> > > (for IO over time stats etc) that often are monitored at a somewhat high
> > > frequency - they also pay the price of reading in all object stats. On my
> > > example database with 1M tables it takes 0.4s to read pg_stat_database.
> >
> > IMV, singeling things out into "larger groups" would be one perfectly
> > acceptable compromise. That is, say that pg_stat_user_tables can be
> > inconsistent vs with pg_stat_bgwriter, but it cannot be inconsistent
> > with itself.
>
> I don't think that buys us all that much though. It still is a huge
> issue that we need to cache the stats for all relations even though we
> only access the stats for table.

Well, you yourself just mentioned that access to bgwriter and db stats
are often sampled at a higher frequency.

That said, this can often include *individual* tables as well, but
maybe not all at once.

> > > > I also believe that the snapshotting behavior has advantages in terms
> > > > of being able to perform multiple successive queries and get consistent
> > > > results from them. Only the most trivial sorts of analysis don't need
> > > > that.
> > >
> > > In most cases you'd not do that in a transaction tho, and you'd need to create
> > > temporary tables with a snapshot of the stats anyway.
> >
> > I'd say in most cases this analysis happens in snapshots anyway, and
> > those are snapshots unrelated to what we do in pg_stat. It's either
> > snapshotted to tables, or to storage in a completely separate
> > database.
>
> Agreed. I wonder some of that work would be made easier if we added a
> function to export all the data in the current snapshot as a json
> document or such? If we add configurable caching (see below) that'd
> really not be a lot of additional work.

I'd assume if you want the snapshot in the database, you'd want it in
a database format. That is, if you want the snapshot in the db, you
actually *want* it in a table or similar. I'm not sure json format
would really help that much?

What would probably be interesting to more in that case is if we could
build ourselves, either builtin or as an extension a background worker
that would listen and export openmetrics format talking directly to
the stats and bypassing the need to have an exporter running that
connects as a regular user and ends up converting the things between
many different formats on the way.

> > I would *like* to have query-level consistent views. But I may be able
> > to compromise on that one for the sake of performance as well.
> >
> > I definitely need there to be object-level consistent views.
>
> That'd be free if we didn't use all those separate function calls for
> each row in pg_stat_all_tables etc...

So.. We should fix that?

We could still keep those separate functions for backwards
compatibility if we wanted of course, but move the view to use an SRF
and make that SRF also directly callable with useful parameters.

I mean, it's been over 10 years since we did that for pg_stat_activity
in 9.1, so it's perhaps time to do another view? :)

> > > I wonder if a reasonable way out could be to have pg_stat_make_snapshot()
> > > (accompanying the existing pg_stat_clear_snapshot()) that'd do the full eager
> > > data load. But not use any snapshot / caching behaviour without that?
> >
> > I think that's a pretty good idea.
>
> It's what I am leaning towards right now.
>
>
> > Another idea could be a per-user GUC of "stats_snapshots" or so, and
> > then if it's on force the snapshots all times. That way a DBA who
> > wants the snapshots could set it on their own user but keep it off for
> > the automated jobs for example. (It'd basically be the same except
> > automatically calling pg_stat_make_snapshot() the first time stats are
> > queried)
>
> Yea, I was thinking similar. We could e.g. have
> stats_snapshot_consistency, an enum of 'none', 'query', 'snapshot'.
>
> With 'none' there's no consistency beyond that a single pg_stat_get_*
> function call will give consistent results.
>
> With 'query' we cache each object on first access (i.e. there can be
> inconsistency between different objects if their accesses are further
> appart, but not within a stats object).
>
> With 'snapshot' we cache all the stats at the first access, for the
> duration of the transaction.
>
>
> However: I think 'query' is surprisingly hard to implement. There can be
> multiple ongoing overlapping queries ongoing at the same time. And even
> if there aren't multiple ongoing queries, there's really nothing great
> to hook a reset into.
>
> So I'm inclined to instead have 'access', which caches individual stats
> object on first access. But lasts longer than the query.

I guess the naive way to do query would be to just, ahem, lock the
stats while a query is ongoing. But that would probably be *really*
terrible in the case of big stats, yes :)

> Tom, if we defaulted to 'access' would that satisfy your concerns? Or
> would even 'none' as a default do? I'd rather not have SELECT * FROM
> pg_stat_all_tables; accumulate an unnecessary copy of most of the
> cluster's stats in memory just to immediately throw it away again.

But in your "cache individual stats object on first access", it would
copy most of the stats over when you did a "SELECT *", no? It would
only be able to avoid that if you had a WHERE clause on it limiting
what you got?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-03-25 23:22:33 Re: Support for NSS as a libpq TLS backend
Previous Message Markus Wanner 2021-03-25 22:15:50 Re: [PATCH] add concurrent_abort callback for output plugin