Re: pg_buffercache: Add per-relation summary stats

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Lukas Fittl <lukas(at)fittl(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>, Khoa Nguyen <khoaduynguyen(at)gmail(dot)com>
Subject: Re: pg_buffercache: Add per-relation summary stats
Date: 2026-03-24 19:09:13
Message-ID: CAD21AoCKz0t06-qOOm_7kgMn=Niu0yYB3R5jhOY0oVb=5AWytA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Lukas,

On Sat, Feb 28, 2026 at 3:59 PM Lukas Fittl <lukas(at)fittl(dot)com> wrote:
>
> Hi,
>
> See attached a patch that implements a new function,
> pg_buffercache_relation_stats(), which returns per-relfilenode
> statistics on the number of buffers, how many are dirtied/pinned, and
> their avg usage count.

Thank you for the proposal!

Paul A Jungwirth, Khoa Nguyen, and I reviewed this patch through the
Patch Review Workshop, and I'd like to share our comments.

>
> This can be used in monitoring scripts to know which relations are
> kept in shared buffers, to understand performance issues better that
> occur due to relations getting evicted from the cache. In our own
> monitoring tool (pganalyze) we've offered a functionality like this
> based on the existing pg_buffercache() function for a bit over a year
> now [0], and people have found this very valuable - but it doesn't
> work for larger database servers.
>
> Specifically, performing a query that gets this information can be
> prohibitively expensive when using large shared_buffers, and even on
> the default 128MB shared buffers there is a measurable difference:
>
> postgres=# WITH pg_buffercache_relation_stats AS (
> SELECT relfilenode, reltablespace, reldatabase, relforknumber,
> COUNT(*) AS buffers,
> COUNT(*) FILTER (WHERE isdirty) AS buffers_dirty,
> COUNT(*) FILTER (WHERE pinning_backends > 0) AS buffers_pinned,
> AVG(usagecount) AS usagecount_avg
> FROM pg_buffercache
> WHERE reldatabase IS NOT NULL
> GROUP BY 1, 2, 3, 4
>
> )
> SELECT * FROM pg_buffercache_relation_stats WHERE relfilenode = 2659;
>
> relfilenode | reltablespace | reldatabase | relforknumber | buffers |
> buffers_dirty | buffers_pinned | usagecount_avg
> -------------+---------------+-------------+---------------+---------+---------------+----------------+--------------------
> 2659 | 1663 | 5 | 0 | 8 |
> 0 | 0 | 5.0000000000000000
> 2659 | 1663 | 1 | 0 | 7 |
> 0 | 0 | 5.0000000000000000
> 2659 | 1663 | 229553 | 0 | 7 |
> 0 | 0 | 5.0000000000000000
> (3 rows)
>
> Time: 20.991 ms
>
> postgres=# SELECT * FROM pg_buffercache_relation_stats() WHERE
> relfilenode = 2659;
> relfilenode | reltablespace | reldatabase | relforknumber | buffers |
> buffers_dirty | buffers_pinned | usagecount_avg
> -------------+---------------+-------------+---------------+---------+---------------+----------------+----------------
> 2659 | 1663 | 1 | 0 | 7 |
> 0 | 0 | 5
> 2659 | 1663 | 229553 | 0 | 7 |
> 0 | 0 | 5
> 2659 | 1663 | 5 | 0 | 8 |
> 0 | 0 | 5
> (3 rows)
>
> Time: 2.912 ms
>
> With the new function this gets done before putting the data in the
> tuplestore used for the set-returning function.

Overall, we find that the proposed feature is useful. The proposed way
is much cheaper, especially when the number of per-relation stats is
not large.

Here are review comments on the v1 patch:

---
- pg_buffercache--1.5--1.6.sql pg_buffercache--1.6--1.7.sql
+ pg_buffercache--1.5--1.6.sql pg_buffercache--1.6--1.7.sql \
+ pg_buffercache--1.7--1.8.sql

Since commit 4b203d499c6 bumped the version from 1.6 to 1.7 last
November, we think we don't need to bump the version again for this new
feature.

---
+/*
+ * Hash key for pg_buffercache_relation_stats — groups by relation identity.
+ */
+typedef struct
+{
+ RelFileNumber relfilenumber;
+ Oid reltablespace;
+ Oid reldatabase;
+ ForkNumber forknum;
+} BufferRelStatsKey;
+
+/*
+ * Hash entry for pg_buffercache_relation_stats — accumulates per-relation
+ * buffer statistics.
+ */
+typedef struct
+{
+ BufferRelStatsKey key; /* must be first */
+ int32 buffers;
+ int32 buffers_dirty;
+ int32 buffers_pinned;
+ int64 usagecount_total;
+} BufferRelStatsEntry;

Can we move these typedefs above function prototypes as other typedefs
are defined there?

---
+ relstats_hash = hash_create("pg_buffercache relation stats",
+ 128,
+ &hash_ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);

It might be worth considering simplehash.h for even better performance.

---
+ while ((entry = (BufferRelStatsEntry *) hash_seq_search(&hash_seq)) != NULL)
+ {
+ if (entry->buffers == 0)
+ continue;
+

We might want to put CHECK_FOR_INTERRUPTS() here too as the number of
entries can be as many as NBuffers in principle.

---
We've discussed there might be room for improvement in the function
name. For example, pg_buffercache_relations instead of
pg_buffercache_relation_stats might be a good name, since everything
in this module
is stats. if we drop "_stats" then "relation" should be plural, to
match other functions in the module ("pages", "os_pages",
"numa_pages", "usage_counts").

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2026-03-24 19:44:38 Re: Expanding HOT updates for expression and partial indexes
Previous Message Tom Lane 2026-03-24 18:58:10 Re: Fix bug with accessing to temporary tables of other sessions