Re: shared memory based stat collector (was: Sharing record typmods between backends)

From: David Fetter <david(at)fetter(dot)org>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, David Gould <daveg(at)sonic(dot)net>
Subject: Re: shared memory based stat collector (was: Sharing record typmods between backends)
Date: 2017-08-14 18:17:05
Message-ID: 20170814181705.GB3471@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 13, 2017 at 05:56:56PM -0700, Andres Freund wrote:
> Hi,
>
> Since we're getting a bit into the weeds of a different topic, and since
> I think it's an interesting feature, I'm detaching this into a separate
> thread.
>
> On 2017-08-12 23:37:27 -0400, Tom Lane wrote:
> > >> On 2017-08-12 22:52:57 -0400, Robert Haas wrote:
> > >>> I think it'd be pretty interesting to look at replacing parts of the
> > >>> stats collector machinery with something DHT-based.
> > > On Sat, Aug 12, 2017 at 11:30 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >> That seems to involve a lot more than this though, given that currently
> > >> the stats collector data doesn't entirely have to be in memory. I've
> > >> seen sites with a lot of databases with quite some per-database stats
> > >> data. Don't think we can just require that to be in memory :(
> >
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > Hmm. I'm not sure it wouldn't end up being *less* memory. Don't we
> > > end up caching 1 copy of it per backend, at least for the database to
> > > which that backend is connected? Accessing a shared copy would avoid
> > > that sort of thing.
> >
> > Yeah ... the collector itself has got all that in memory anyway.
> > We do need to think about synchronization issues if we make that
> > memory globally available, but I find it hard to see how that would
> > lead to more memory consumption overall than what happens now.
>
> You both are obviously right. Another point of potential concern could
> be that we'd pretyt much fully rely on dsm/dht's being available, for
> the server to function correctly. Are we ok with that? Right now
> postgres still works perfectly well, leaving parallelism aside, with
> dynamic_shared_memory_type = none.
>
>
> What are your thoughts about how to actually implement this? It seems
> we'd have to do something like:
>
> 1) Keep the current per-backend & per-transaction state in each
> backend. That allows both to throw away the information and avoids
> increasing contention quite noticeably.
>
> 2) Some plain shared memory with metadata. A set of shared hashtables
> for per database, per relation contents.
>
> 3) Individual database/relation entries are either individual atomics
> (we don't rely on consistency anyway), or seqcount (like
> st_changecount) based.
>
> 4) Instead of sending stats at transaction end, copy them into a
> "pending" entry. Nontransactional contents can be moved to
> the pending entry more frequently.
>
> 5) Occasionally, try to flush the pending array into the global hash.
> The lookup in the table would be protected by something
> LWLockConditionalAcquire() based, to avoid blocking - don't want to
> introduce chokepoints due to commonly used tables and such. Updating
> the actual stats can happen without the partition locks being held.
>
> I think there's two other relevant points here:
>
> a) It'd be quite useful to avoid needing a whole cluster's stats in
> memory. Even if $subject would save memory, I'm hesitant committing
> to something requiring all stats to be in memory forever. As a first
> step it seems reasonable to e.g. not require state for all databases
> to be in memory. The first time per-database stats are required, it
> could be "paged in". Could even be more aggressive and do that on a
> per-table level and just have smaller placeholder entries for
> non-accessed tables, but that seems more work.
>
> On the other hand, autoavcuum is likely going to make that approach
> useless anyway, given it's probably going to access otherwise unneded
> stats regularly.
>
> b) I think our tendency to dump all stats whenever we crash isn't really
> tenable, given how autovacuum etc are tied to them. We should think
> about ways to avoid that if we're going to do a major rewrite of the
> stats stuff, which this certainly sounds like.
>
>
> If there weren't HS to worry about, these two points kinda sound like
> the data should be persisted into an actual table, rather than some
> weird other storage format. But HS seems to make that untenable.
>
> Greetings,
>
> Andres Freund

As I recall, David Gould (hi!) had run across a case where there were
thousand of tables and the stats file became a pretty serious
bottleneck. There might even be a design or even code to address it.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Travers 2017-08-14 18:18:50 Re: Orphaned files in base/[oid]
Previous Message Peter Geoghegan 2017-08-14 18:14:59 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values