| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Lukas Fittl <lukas(at)fittl(dot)com> |
| Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>, Khoa Nguyen <khoaduynguyen(at)gmail(dot)com> |
| Subject: | Re: pg_buffercache: Add per-relation summary stats |
| Date: | 2026-03-25 16:48:52 |
| Message-ID: | CAD21AoCzv5q89VDZ=Ot1wWguBO0MGdCc0-NXN83-Jg57uiJN1Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Mar 24, 2026 at 11:47 PM Lukas Fittl <lukas(at)fittl(dot)com> wrote:
>
> Hi Ashutosh,
>
> On Tue, Mar 24, 2026 at 11:24 PM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > I know we already have a couple of hand-aggregation functions but I am
> > hesitant to add more of these. Question is where do we stop? For
> > example, the current function is useless if someone wants to find the
> > parts of a relation which are hot since it doesn't include page
> > numbers. Do we write another function for the same? Or we add page
> > numbers to this function and then there's hardly any aggregation
> > happening. What if somebody wanted to perform an aggregation more
> > complex than just count() like average number of buffers per relation
> > or distribution of relation buffers in the cache, do they write
> > separate functions?
>
> I think the problem this solves for, which is a very common question I
> hear from end users, is "how much of this table/index is in cache" and
> "was our query slow because the cache contents changed?".
>
> It can't provide a perfect answer to all questions regarding what's in
> the cache (i.e. it won't tell you which part of the table is cached),
> but its in line with other statistics we do already provide in
> pg_stat_user_tables etc., which are all aggregate counts, not further
> breakdowns.
>
> Its also a reasonable compromise on providing something usable that
> can be shown on dashboards, as I've seen in collecting this
> information using the existing methods from small production systems
> in practice over the last ~1.5 years.
Regarding the proposed statistics, I find them reasonably useful for
many users. I'm not sure we need to draw a strict line on what belongs
in the module. If a proposed function does exactly what most
pg_buffercache users want or are already writing themselves, that is
good enough motivation to include it.
I think pg_visibility is a good precedent here. In that module, we
have both pg_visibility_map() and pg_visibility_map_summary(), even
though we can retrieve the exact same results as the latter by simply
using the former:
select sum(all_visible::int), sum(all_frozen::int) from
pg_visibility_map('test') ;
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Antonin Houska | 2026-03-25 16:52:56 | Re: Adding REPACK [concurrently] |
| Previous Message | Masahiko Sawada | 2026-03-25 16:33:03 | Re: Add tab completion for SERVER and CONNECTION keywords in psql |