Re: How to find the number of cached pages for a relation?

From: Amin <amin(dot)fallahi(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: How to find the number of cached pages for a relation?
Date: 2023-01-28 00:11:26
Message-ID: CAF-KA889ceqFa-t=7NF+4KBZmBax_gxBV=TRu7kLV7+sV5egSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you Andres.

If I want to do "a" ( Do one probe of the buffer mapping table for each
block of the relation. Thus O(#relation blocks)) what function calls can I
use, assuming I only have access to the relation id? How can I access and
scan the buffer mapping table?

On Fri, Jan 13, 2023 at 6:27 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2023-01-13 17:28:31 -0800, Amin wrote:
> > Before scanning a relation, in the planner stage, I want to make a call
> to
> > retrieve information about how many pages will be a hit for a specific
> > relation. The module pg_buffercache seems to be doing a similar thing.
> > Also, pg_statio_all_tables seems to be having that information, but it is
> > updated after execution. However, I want the information before
> execution.
> > Also not sure how pg_statio_all_tables is created and how I can access it
> > in the code.
>
> There's no cheap way to do that. Currently the only ways are to:
>
> a) Do one probe of the buffer mapping table for each block of the
> relation. Thus O(#relation blocks).
>
> b) Scan all of buffer headers, check which are for the relation. Thus
> O(#NBuffers)
>
> Neither of which are a good idea during planning.
>
>
> It might be a bit more realistic to get very rough estimates:
>
> You could compute the table's historic cache hit ratio from pgstats (i.e.
> use
> the data backing pg_statio_all_tables). Of course that's not going to be
> specific to your query (for index scans etc), and might have changed more
> recently. It'd also be completely wrong after a restart.
>
> If we had information about *recent* cache hit patterns for the relation,
> it'd
> be a lot better, but we don't have the infrastructure for that, and
> introducing it would increase the size of the stats entries noticably.
>
> Another way could be to probe the buffer mapping table for a small subset
> of
> the locks and infer the likelihood of other blocks being in shared buffers
> that way.
>
> A third way could be to track the cache hit for relations in backend local
> memory, likely in the relache entry. The big disadvantage would be that
> query
> plans would differ between connections and that connections would need to
> "warm up" to have good plans. But it'd handle restarts nicely.
>
> Greetings,
>
> Andres Freund
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-28 00:23:19 Re: recovery modules
Previous Message Andres Freund 2023-01-28 00:09:39 Re: recovery modules