shared memory stats ideas

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Peter Geoghegan <pg(at)bowt(dot)ie>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: shared memory stats ideas
Date: 2022-10-19 18:19:30
Message-ID: 20221019181930.bx73kul4nbiftr65@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

shortly after shared memory stats went in I had a conversation with Lukas
about what it'd enable us going forward. I also chatted with Peter about
autovacuum related stats. I started to write an email, but then somehow lost
the draft and couldn't bring myself to start from scratch.

Here's a largely unordered list of ideas. I'm not planning to work on them
myself, but thought it'd nevertheless be useful to have them memorialized
somewhere.

1) Track some statistics based on relfilenodes rather than oids

We currently track IO related statistics as part of the normal relation
stats. The problem is that that prevents us from collecting stats whenever we
operate on a relfilenode, rather than a Relation.

We e.g. currently can't track the number of blocks written out in a relation,
because we don't have a Relation at that point. Nor can't we really get hold
of one, as the writeback can happen in a different database without access to
pg_class. Which is also the reason why the per-relation IO stats aren't
populated by the startup process, even though it'd obviously sometimes be
helpful to know where the most IO time is spent on a standby.

There's also quite a bit of contortions of the bufmgr interface related to
this.

I think the solution to this is actually fairly simple: We split the IO
related statistics out from the relation statistics, and track them on a
relfilenode basis instead. That'd allow us to track all the IO stats from all
the places, rather than the partial job we do right now.

2) Split index and table statistics into different types of stats

We track both types of statistics in the same format and rename column in
views etc to make them somewhat sensible. A number of the "columns" in index
stats are currently unused.

If we split the stats for indexes and relations we can have reasonable names
for the fields, shrink the current memory usage by halfing the set of fields
we keep for indexes, and extend the stats in a more targeted fashion.

This e.g. would allow us keep track of the number of index entries killed via
the killtuples mechanism, which in turn would allow us to more intelligently
decide whether we should vacuum indexes (often the most expensive part of
vacuum). In a lot of workload killtuples takes care of most of the cleanup,
but in others it doesn't do much.

3) Maintain more historical statistics about vacuuming

We currently track the last time a table was vacuumed, the number of times it
was vacuumed and a bunch of counters for the number of modified tuples since
the last vacuum.

However, none of that allows the user to identify which relations are causing
autovacuum to not keep up. Even just keeping track of the the total time
autovacuum has spent on certain relations would be a significant improvement,
with more easily imaginable (total IO [time], autovacuum delay time, xid age).

4) Make the stats mechanism extensible

Most of the work towards this has already been done, but a bit more work is
necessary. The hardest likely is how to identify stats belonging to an
extension across restarts.

There's a bunch of extensions with their own stats mechanisms, but it's hard
to get this stuff right from the outside.

5) Use extensible shared memory stats to store pg_stat_statements data

pg_stat_statements current mechanism has a few issues. The top ones I know of
are:

- Contention on individual stats entries when the same queryid is executed
concurrently. pgstats deals with this by allowing stats to be collected in
backend local memory and to be flushed into shared stats at a lower
frequency.

- The querytext file can get huge (I've seen > 100GB) and cause massive
slowdowns. It's better than the old fixed-length, fixed-shared-memory
mechansism, don't get me wrong. But we can do better by storing the data in
dynamic shared memory and then also support trimming based on the total
size.

There were some other things, but I can't remember them right now.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message samay sharma 2022-10-19 18:35:10 Documentation for building with meson
Previous Message Andres Freund 2022-10-19 17:08:33 Re: interrupted tap tests leave postgres instances around