Re: monitoring usage count distribution

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, schneider(at)ardentperf(dot)com
Subject: Re: monitoring usage count distribution
Date: 2023-04-04 23:29:19
Message-ID: 20230404232919.uibzbhjdylk3mlvp@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-04-04 14:31:36 -0400, Robert Haas wrote:
> On Mon, Jan 30, 2023 at 6:30 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> > My colleague Jeremy Schneider (CC'd) was recently looking into usage count
> > distributions for various workloads, and he mentioned that it would be nice
> > to have an easy way to do $SUBJECT. I've attached a patch that adds a
> > pg_buffercache_usage_counts() function. This function returns a row per
> > possible usage count with some basic information about the corresponding
> > buffers.
> >
> > postgres=# SELECT * FROM pg_buffercache_usage_counts();
> > usage_count | buffers | dirty | pinned
> > -------------+---------+-------+--------
> > 0 | 0 | 0 | 0
> > 1 | 1436 | 671 | 0
> > 2 | 102 | 88 | 0
> > 3 | 23 | 21 | 0
> > 4 | 9 | 7 | 0
> > 5 | 164 | 106 | 0
> > (6 rows)
> >
> > This new function provides essentially the same information as
> > pg_buffercache_summary(), but pg_buffercache_summary() only shows the
> > average usage count for the buffers in use. If there is interest in this
> > idea, another approach to consider could be to alter
> > pg_buffercache_summary() instead.
>
> I'm skeptical that pg_buffercache_summary() is a good idea at all

Why? It's about two orders of magnitude faster than querying the equivalent
data by aggregating in SQL. And knowing how many free and dirty buffers are
over time is something quite useful to monitor / correlate with performance
issues.

> but having it display the average usage count seems like a particularly poor
> idea. That information is almost meaningless.

I agree there are more meaningful ways to represent the data, but I don't
agree that it's almost meaningless. It can give you a rough estimate of
whether data in s_b is referenced or not.

> Replacing that with a six-element integer array would be a clear improvement
> and, IMHO, better than adding yet another function to the extension.

I'd have no issue with that.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-04-05 00:27:33 CREATE SUBSCRIPTION -- add missing tab-completes
Previous Message Andres Freund 2023-04-04 23:25:27 Re: monitoring usage count distribution