shared memory based stat collector (was: Sharing record typmods between backends)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: shared memory based stat collector (was: Sharing record typmods between backends)
Date: 2017-08-14 00:56:56
Message-ID: 20170814005656.d5tvz464qkmz66tq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Since we're getting a bit into the weeds of a different topic, and since
I think it's an interesting feature, I'm detaching this into a separate
thread.

On 2017-08-12 23:37:27 -0400, Tom Lane wrote:
> >> On 2017-08-12 22:52:57 -0400, Robert Haas wrote:
> >>> I think it'd be pretty interesting to look at replacing parts of the
> >>> stats collector machinery with something DHT-based.
> > On Sat, Aug 12, 2017 at 11:30 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >> That seems to involve a lot more than this though, given that currently
> >> the stats collector data doesn't entirely have to be in memory. I've
> >> seen sites with a lot of databases with quite some per-database stats
> >> data. Don't think we can just require that to be in memory :(
>
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > Hmm. I'm not sure it wouldn't end up being *less* memory. Don't we
> > end up caching 1 copy of it per backend, at least for the database to
> > which that backend is connected? Accessing a shared copy would avoid
> > that sort of thing.
>
> Yeah ... the collector itself has got all that in memory anyway.
> We do need to think about synchronization issues if we make that
> memory globally available, but I find it hard to see how that would
> lead to more memory consumption overall than what happens now.

You both are obviously right. Another point of potential concern could
be that we'd pretyt much fully rely on dsm/dht's being available, for
the server to function correctly. Are we ok with that? Right now
postgres still works perfectly well, leaving parallelism aside, with
dynamic_shared_memory_type = none.

What are your thoughts about how to actually implement this? It seems
we'd have to do something like:

1) Keep the current per-backend & per-transaction state in each
backend. That allows both to throw away the information and avoids
increasing contention quite noticeably.

2) Some plain shared memory with metadata. A set of shared hashtables
for per database, per relation contents.

3) Individual database/relation entries are either individual atomics
(we don't rely on consistency anyway), or seqcount (like
st_changecount) based.

4) Instead of sending stats at transaction end, copy them into a
"pending" entry. Nontransactional contents can be moved to
the pending entry more frequently.

5) Occasionally, try to flush the pending array into the global hash.
The lookup in the table would be protected by something
LWLockConditionalAcquire() based, to avoid blocking - don't want to
introduce chokepoints due to commonly used tables and such. Updating
the actual stats can happen without the partition locks being held.

I think there's two other relevant points here:

a) It'd be quite useful to avoid needing a whole cluster's stats in
memory. Even if $subject would save memory, I'm hesitant committing
to something requiring all stats to be in memory forever. As a first
step it seems reasonable to e.g. not require state for all databases
to be in memory. The first time per-database stats are required, it
could be "paged in". Could even be more aggressive and do that on a
per-table level and just have smaller placeholder entries for
non-accessed tables, but that seems more work.

On the other hand, autoavcuum is likely going to make that approach
useless anyway, given it's probably going to access otherwise unneded
stats regularly.

b) I think our tendency to dump all stats whenever we crash isn't really
tenable, given how autovacuum etc are tied to them. We should think
about ways to avoid that if we're going to do a major rewrite of the
stats stuff, which this certainly sounds like.

If there weren't HS to worry about, these two points kinda sound like
the data should be persisted into an actual table, rather than some
weird other storage format. But HS seems to make that untenable.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-08-14 01:47:35 Re: pgsql 10: hash indexes testing
Previous Message AP 2017-08-14 00:40:36 Re: pgsql 10: hash indexes testing