Re: shared memory stats: high level design decisions: consistency, dropping

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: shared memory stats: high level design decisions: consistency, dropping
Date: 2021-03-24 13:26:12
Message-ID: CABUevEwkXVxW65f-rWUg56zrH_kAgrfT1byag-=3wJsX-VECqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 23, 2021 at 4:21 AM Greg Stark <stark(at)mit(dot)edu> wrote:
>
> On Sun, 21 Mar 2021 at 18:16, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> >
> > Greetings,
> >
> > * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > > I also believe that the snapshotting behavior has advantages in terms
> > > of being able to perform multiple successive queries and get consistent
> > > results from them. Only the most trivial sorts of analysis don't need
> > > that.
> > >
> > > In short, what you are proposing sounds absolutely disastrous for
> > > usability of the stats views, and I for one will not sign off on it
> > > being acceptable.
> > >
> > > I do think we could relax the consistency guarantees a little bit,
> > > perhaps along the lines of only caching view rows that have already
> > > been read, rather than grabbing everything up front. But we can't
> > > just toss the snapshot concept out the window. It'd be like deciding
> > > that nobody needs MVCC, or even any sort of repeatable read.
> >
> > This isn't the same use-case as traditional tables or relational
> > concepts in general- there aren't any foreign keys for the fields that
> > would actually be changing across these accesses to the shared memory
> > stats- we're talking about gross stats numbers like the number of
> > inserts into a table, not an employee_id column. In short, I don't
> > agree that this is a fair comparison.
>
> I use these stats quite a bit and do lots of slicing and dicing with
> them. I don't think it's as bad as Tom says but I also don't think we
> can be quite as loosy-goosy as I think Andres or Stephen might be
> proposing either (though I note that haven't said they don't want any
> consistency at all).
>
> The cases where the consistency really matter for me is when I'm doing
> math involving more than one statistic.
>
> Typically that's ratios. E.g. with pg_stat_*_tables I routinely divide
> seq_tup_read by seq_scan or idx_tup_* by idx_scans. I also often look
> at the ratio between n_tup_upd and n_tup_hot_upd.
>
> And no, it doesn't help that these are often large numbers after a
> long time because I'm actually working with the first derivative of
> these numbers using snapshots or a time series database. So if you
> have the seq_tup_read incremented but not seq_scan incremented you
> could get a wildly incorrect calculation of "tup read per seq scan"
> which actually matters.
>
> I don't think I've ever done math across stats for different objects.
> I mean, I've plotted them together and looked at which was higher but
> I don't think that's affected by some plots having peaks slightly out
> of sync with the other. I suppose you could look at the ratio of
> access patterns between two tables and know that they're only ever
> accessed by a single code path at the same time and therefore the
> ratios would be meaningful. But I don't think users would be surprised
> to find they're not consistent that way either.

Yeah, it's important to differentiate if things can be inconsistent
within a single object, or just between objects. And I agree that in a
lot of cases, just having per-object consistent data is probably
enough.

Normally when you graph things for example, your peaks will look
across >1 sample point anyway, and in that case it doesn't much matter
does it?

But if we said we try to offer per-object consistency only, then for
example the idx_scans value in the tables view may see changes to some
but not all indexes on that table. Would that be acceptable?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2021-03-24 13:36:15 Re: PoC/WIP: Extended statistics on expressions
Previous Message Andrew Dunstan 2021-03-24 13:23:05 Re: multi-install PostgresNode