Quick Links

Re: shared memory stats: high level design decisions: consistency, dropping

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject:	Re: shared memory stats: high level design decisions: consistency, dropping
Date:	2021-03-24 13:42:11
Message-ID:	CABUevEzRBMMXOnHTDtamLnGpm=fmYvvoJ2QuvqXHa+uqRrTPqA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Mar 21, 2021 at 11:34 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2021-03-21 12:14:35 -0400, Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > 1) What kind of consistency do we want from the pg_stats_* views?
> >
> > That's a hard choice to make. But let me set the record straight:
> > when we did the initial implementation, the stats snapshotting behavior
> > was considered a FEATURE, not an "efficiency hack required by the old
> > storage model".
>
> Oh - sorry for misstating that then. I did try to look for the origins of the
> approach, and all that I found was that it'd be too expensive to do multiple
> stats file reads.
>
>
> > If I understand what you are proposing, all stats views would become
> > completely volatile, without even within-query consistency. That really
> > is not gonna work. As an example, you could get not-even-self-consistent
> > results from a join to a stats view if the planner decides to implement
> > it as a nestloop with the view on the inside.
>
> I don't really think it's a problem that's worth incuring that much cost to
> prevent. We already have that behaviour for a number of of the pg_stat_* views,
> e.g. pg_stat_xact_all_tables, pg_stat_replication.

Aren't those both pretty bad examples though?

pg_stat_xact_all_tables surely is within-query consistent, and would
be pretty useless if it wwas within-transaction consistent?

pg_stat_replication is a snapshot of what things are right now (like
pg_stat_activity), and not collected statistics.

Maybe there's inconsistency in that they should've had a different
name to separate it out, but fundamentally having xact consistent
views there would be a bad thing, no?

> If the cost were low - or we can find a reasonable way to get to low costs - I
> think it'd be worth preserving for backward compatibility's sake alone. From
> an application perspective, I actually rarely want that behaviour for stats
> views - I'm querying them to get the most recent information, not an older
> snapshot. And in the cases I do want snapshots, I'd want them for longer than a
> transaction.

I agree in general, but I'd want them to be *query-consistent*, not
*transaction-consistent*. But the question is as you say, am I willing
to pay for that. Less certain of that.

> There's just a huge difference between being able to access a table's stats in
> O(1) time, or having a single stats access be O(database-objects).
>
> And that includes accesses to things like pg_stat_bgwriter, pg_stat_database
> (for IO over time stats etc) that often are monitored at a somewhat high
> frequency - they also pay the price of reading in all object stats. On my
> example database with 1M tables it takes 0.4s to read pg_stat_database.

IMV, singeling things out into "larger groups" would be one perfectly
acceptable compromise. That is, say that pg_stat_user_tables can be
inconsistent vs with pg_stat_bgwriter, but it cannot be inconsistent
with itself.

Basically anything that's "global" seems like it could be treated that
way, independent of each other.

For relations and such having a way to get just a single relation
stats or a number of them that will be consistent with each other
without getting all of them, could also be a reasonable optimization.
Mabye an SRF that takes an array of oids as a parameter and returns
consistent data across those, without having to copy/mess with the
rest?

> We currently also fetch the full stats in places like autovacuum.c. Where we
> don't need repeated access to be consistent - we even explicitly force the
> stats to be re-read for every single table that's getting vacuumed.
>
> Even if we to just cache already accessed stats, places like do_autovacuum()
> would end up with a completely unnecessary cache of all tables, blowing up
> memory usage by a large amount on systems with lots of relations.

autovacuum is already dealing with things being pretty fuzzy though,
so it shouldn't matter much there?

But autovacuum might also deserve it's own interface to access the
data directly and doesn't have to follow the same one as the stats
views in this new scheme, perhaps?

> > I also believe that the snapshotting behavior has advantages in terms
> > of being able to perform multiple successive queries and get consistent
> > results from them. Only the most trivial sorts of analysis don't need
> > that.
>
> In most cases you'd not do that in a transaction tho, and you'd need to create
> temporary tables with a snapshot of the stats anyway.

I'd say in most cases this analysis happens in snapshots anyway, and
those are snapshots unrelated to what we do in pg_stat. It's either
snapshotted to tables, or to storage in a completely separate
database.

> > In short, what you are proposing sounds absolutely disastrous for
> > usability of the stats views, and I for one will not sign off on it
> > being acceptable.
>
> :(
>
> That's why I thought it'd be important to bring this up to a wider
> audience. This has been discussed several times in the thread, and nobody
> really chimed up wanting the "snapshot" behaviour...

I can chime in with the ones saying I don't think I need that kind of
snapshot behaviour.

I would *like* to have query-level consistent views. But I may be able
to compromise on that one for the sake of performance as well.

I definitely need there to be object-level consistent views.

> > I do think we could relax the consistency guarantees a little bit,
> > perhaps along the lines of only caching view rows that have already
> > been read, rather than grabbing everything up front. But we can't
> > just toss the snapshot concept out the window. It'd be like deciding
> > that nobody needs MVCC, or even any sort of repeatable read.
>
> I think that'd still a huge win - caching only what's been accessed rather than
> everything will save a lot of memory in very common cases. I did bring it up as
> one approach for that reason.
>
> I do think it has a few usability quirks though. The time-skew between stats
> objects accessed at different times seems like it could be quite confusing?
> E.g. imagine looking at table stats and then later join to index stats and see
> table / index stats not matching up at all.
>
>
> I wonder if a reasonable way out could be to have pg_stat_make_snapshot()
> (accompanying the existing pg_stat_clear_snapshot()) that'd do the full eager
> data load. But not use any snapshot / caching behaviour without that?

I think that's a pretty good idea.

I bet the vast majority of all queries against the pg_stat views are
done by automated tools, and they don't care about the snapshot
behaviour, and thus wouldn't have to pay the overhead. In the more
rare cases when you do the live-analysis, you can explicitly request
it.

Another idea could be a per-user GUC of "stats_snapshots" or so, and
then if it's on force the snapshots all times. That way a DBA who
wants the snapshots could set it on their own user but keep it off for
the automated jobs for example. (It'd basically be the same except
automatically calling pg_stat_make_snapshot() the first time stats are
queried)

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Re: shared memory stats: high level design decisions: consistency, dropping at 2021-03-21 22:34:45 from Andres Freund

Responses

Re: shared memory stats: high level design decisions: consistency, dropping at 2021-03-25 17:20:01 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2021-03-24 14:11:41	psql lacking clearerr()
Previous Message	Dean Rasheed	2021-03-24 13:36:15	Re: PoC/WIP: Extended statistics on expressions