Re: How to find the number of cached pages for a relation?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amin <amin(dot)fallahi(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: How to find the number of cached pages for a relation?
Date: 2023-01-14 02:27:24
Message-ID: 20230114022724.jpp5l3g32obl67tf@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-13 17:28:31 -0800, Amin wrote:
> Before scanning a relation, in the planner stage, I want to make a call to
> retrieve information about how many pages will be a hit for a specific
> relation. The module pg_buffercache seems to be doing a similar thing.
> Also, pg_statio_all_tables seems to be having that information, but it is
> updated after execution. However, I want the information before execution.
> Also not sure how pg_statio_all_tables is created and how I can access it
> in the code.

There's no cheap way to do that. Currently the only ways are to:

a) Do one probe of the buffer mapping table for each block of the
relation. Thus O(#relation blocks).

b) Scan all of buffer headers, check which are for the relation. Thus
O(#NBuffers)

Neither of which are a good idea during planning.

It might be a bit more realistic to get very rough estimates:

You could compute the table's historic cache hit ratio from pgstats (i.e. use
the data backing pg_statio_all_tables). Of course that's not going to be
specific to your query (for index scans etc), and might have changed more
recently. It'd also be completely wrong after a restart.

If we had information about *recent* cache hit patterns for the relation, it'd
be a lot better, but we don't have the infrastructure for that, and
introducing it would increase the size of the stats entries noticably.

Another way could be to probe the buffer mapping table for a small subset of
the locks and infer the likelihood of other blocks being in shared buffers
that way.

A third way could be to track the cache hit for relations in backend local
memory, likely in the relache entry. The big disadvantage would be that query
plans would differ between connections and that connections would need to
"warm up" to have good plans. But it'd handle restarts nicely.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jose Arthur Benetasso Villanova 2023-01-14 03:34:38 Re: Amcheck verification of GiST and GIN
Previous Message Thomas Munro 2023-01-14 02:23:11 Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?