Re: pg_buffercache: Add per-relation summary stats

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Lukas Fittl <lukas(at)fittl(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>, Khoa Nguyen <khoaduynguyen(at)gmail(dot)com>
Subject: Re: pg_buffercache: Add per-relation summary stats
Date: 2026-03-25 16:48:52
Message-ID: CAD21AoCzv5q89VDZ=Ot1wWguBO0MGdCc0-NXN83-Jg57uiJN1Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 24, 2026 at 11:47 PM Lukas Fittl <lukas(at)fittl(dot)com> wrote:
>
> Hi Ashutosh,
>
> On Tue, Mar 24, 2026 at 11:24 PM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > I know we already have a couple of hand-aggregation functions but I am
> > hesitant to add more of these. Question is where do we stop? For
> > example, the current function is useless if someone wants to find the
> > parts of a relation which are hot since it doesn't include page
> > numbers. Do we write another function for the same? Or we add page
> > numbers to this function and then there's hardly any aggregation
> > happening. What if somebody wanted to perform an aggregation more
> > complex than just count() like average number of buffers per relation
> > or distribution of relation buffers in the cache, do they write
> > separate functions?
>
> I think the problem this solves for, which is a very common question I
> hear from end users, is "how much of this table/index is in cache" and
> "was our query slow because the cache contents changed?".
>
> It can't provide a perfect answer to all questions regarding what's in
> the cache (i.e. it won't tell you which part of the table is cached),
> but its in line with other statistics we do already provide in
> pg_stat_user_tables etc., which are all aggregate counts, not further
> breakdowns.
>
> Its also a reasonable compromise on providing something usable that
> can be shown on dashboards, as I've seen in collecting this
> information using the existing methods from small production systems
> in practice over the last ~1.5 years.

Regarding the proposed statistics, I find them reasonably useful for
many users. I'm not sure we need to draw a strict line on what belongs
in the module. If a proposed function does exactly what most
pg_buffercache users want or are already writing themselves, that is
good enough motivation to include it.

I think pg_visibility is a good precedent here. In that module, we
have both pg_visibility_map() and pg_visibility_map_summary(), even
though we can retrieve the exact same results as the latter by simply
using the former:

select sum(all_visible::int), sum(all_frozen::int) from
pg_visibility_map('test') ;

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2026-03-25 16:52:56 Re: Adding REPACK [concurrently]
Previous Message Masahiko Sawada 2026-03-25 16:33:03 Re: Add tab completion for SERVER and CONNECTION keywords in psql