Re: shared memory stats: high level design decisions: consistency, dropping

From: Andres Freund <andres(at)anarazel(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: shared memory stats: high level design decisions: consistency, dropping
Date: 2021-03-25 17:20:01
Message-ID: 20210325172001.irkjx5rmmvhx2f4a@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-03-24 14:42:11 +0100, Magnus Hagander wrote:
> On Sun, Mar 21, 2021 at 11:34 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > If I understand what you are proposing, all stats views would become
> > > completely volatile, without even within-query consistency. That really
> > > is not gonna work. As an example, you could get not-even-self-consistent
> > > results from a join to a stats view if the planner decides to implement
> > > it as a nestloop with the view on the inside.
> >
> > I don't really think it's a problem that's worth incuring that much cost to
> > prevent. We already have that behaviour for a number of of the pg_stat_* views,
> > e.g. pg_stat_xact_all_tables, pg_stat_replication.
>
> Aren't those both pretty bad examples though?
>
> pg_stat_xact_all_tables surely is within-query consistent, and would
> be pretty useless if it wwas within-transaction consistent?

It's not within-query consistent:

postgres[1182102][1]=# SELECT pg_stat_get_xact_numscans('pg_class'::regclass) UNION ALL SELECT count(*) FROM pg_class UNION ALL SELECT pg_stat_get_xact_numscans('pg_class'::regclass);
┌───────────────────────────┐
│ pg_stat_get_xact_numscans │
├───────────────────────────┤
│ 0 │
│ 397 │
│ 1 │
└───────────────────────────┘
(3 rows)

> pg_stat_replication is a snapshot of what things are right now (like
> pg_stat_activity), and not collected statistics.

However, pg_stat_activity does have snapshot semantics...

> Maybe there's inconsistency in that they should've had a different
> name to separate it out, but fundamentally having xact consistent
> views there would be a bad thing, no?

True. One weird thing around the _xact_ versions that we, at best,
*hint* at in the docs, but also contradict, is that _xact_ views are
actually not tied to the transaction. It's really about unsubmitted
stats. E.g. if executing the following via copy-paste

SELECT count(*) FROM pg_class;
SELECT pg_stat_get_xact_numscans('pg_class'::regclass);
SELECT count(*) FROM pg_class;
SELECT pg_stat_get_xact_numscans('pg_class'::regclass);

will most of the time return
<count>
0
<count>
1

because after the transaction for the first count(*) commits, we'll not
have submitted stats for more than PGSTAT_STAT_INTERVAL. But after the
second count(*) it'll be shorter, therefore the stats won't be
submitted...

> > There's just a huge difference between being able to access a table's stats in
> > O(1) time, or having a single stats access be O(database-objects).
> >
> > And that includes accesses to things like pg_stat_bgwriter, pg_stat_database
> > (for IO over time stats etc) that often are monitored at a somewhat high
> > frequency - they also pay the price of reading in all object stats. On my
> > example database with 1M tables it takes 0.4s to read pg_stat_database.
>
> IMV, singeling things out into "larger groups" would be one perfectly
> acceptable compromise. That is, say that pg_stat_user_tables can be
> inconsistent vs with pg_stat_bgwriter, but it cannot be inconsistent
> with itself.

I don't think that buys us all that much though. It still is a huge
issue that we need to cache the stats for all relations even though we
only access the stats for table.

> > We currently also fetch the full stats in places like autovacuum.c. Where we
> > don't need repeated access to be consistent - we even explicitly force the
> > stats to be re-read for every single table that's getting vacuumed.
> >
> > Even if we to just cache already accessed stats, places like do_autovacuum()
> > would end up with a completely unnecessary cache of all tables, blowing up
> > memory usage by a large amount on systems with lots of relations.
>
> autovacuum is already dealing with things being pretty fuzzy though,
> so it shouldn't matter much there?
>
> But autovacuum might also deserve it's own interface to access the
> data directly and doesn't have to follow the same one as the stats
> views in this new scheme, perhaps?

Yes, we can do that now.

> > > I also believe that the snapshotting behavior has advantages in terms
> > > of being able to perform multiple successive queries and get consistent
> > > results from them. Only the most trivial sorts of analysis don't need
> > > that.
> >
> > In most cases you'd not do that in a transaction tho, and you'd need to create
> > temporary tables with a snapshot of the stats anyway.
>
> I'd say in most cases this analysis happens in snapshots anyway, and
> those are snapshots unrelated to what we do in pg_stat. It's either
> snapshotted to tables, or to storage in a completely separate
> database.

Agreed. I wonder some of that work would be made easier if we added a
function to export all the data in the current snapshot as a json
document or such? If we add configurable caching (see below) that'd
really not be a lot of additional work.

> I would *like* to have query-level consistent views. But I may be able
> to compromise on that one for the sake of performance as well.
>
> I definitely need there to be object-level consistent views.

That'd be free if we didn't use all those separate function calls for
each row in pg_stat_all_tables etc...

> > I wonder if a reasonable way out could be to have pg_stat_make_snapshot()
> > (accompanying the existing pg_stat_clear_snapshot()) that'd do the full eager
> > data load. But not use any snapshot / caching behaviour without that?
>
> I think that's a pretty good idea.

It's what I am leaning towards right now.

> Another idea could be a per-user GUC of "stats_snapshots" or so, and
> then if it's on force the snapshots all times. That way a DBA who
> wants the snapshots could set it on their own user but keep it off for
> the automated jobs for example. (It'd basically be the same except
> automatically calling pg_stat_make_snapshot() the first time stats are
> queried)

Yea, I was thinking similar. We could e.g. have
stats_snapshot_consistency, an enum of 'none', 'query', 'snapshot'.

With 'none' there's no consistency beyond that a single pg_stat_get_*
function call will give consistent results.

With 'query' we cache each object on first access (i.e. there can be
inconsistency between different objects if their accesses are further
appart, but not within a stats object).

With 'snapshot' we cache all the stats at the first access, for the
duration of the transaction.

However: I think 'query' is surprisingly hard to implement. There can be
multiple ongoing overlapping queries ongoing at the same time. And even
if there aren't multiple ongoing queries, there's really nothing great
to hook a reset into.

So I'm inclined to instead have 'access', which caches individual stats
object on first access. But lasts longer than the query.

Tom, if we defaulted to 'access' would that satisfy your concerns? Or
would even 'none' as a default do? I'd rather not have SELECT * FROM
pg_stat_all_tables; accumulate an unnecessary copy of most of the
cluster's stats in memory just to immediately throw it away again.

> I bet the vast majority of all queries against the pg_stat views are
> done by automated tools, and they don't care about the snapshot
> behaviour, and thus wouldn't have to pay the overhead. In the more
> rare cases when you do the live-analysis, you can explicitly request
> it.

I don't think I'd often want it in that case either, but ymmv.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-03-25 17:49:55 Re: shared memory stats: high level design decisions: consistency, dropping
Previous Message Tom Lane 2021-03-25 16:51:55 Re: [PATCH] pg_permissions