Quick Links

Re: Add os_page_num to pg_buffercache

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Add os_page_num to pg_buffercache
Date:	2025-07-01 16:45:37
Message-ID:	d96c343b-959a-45ae-b0d8-e728bb1d0ed2@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 7/1/25 18:34, Bertrand Drouvot wrote:
> Hi,
>
> On Tue, Jul 01, 2025 at 04:31:01PM +0200, Tomas Vondra wrote:
>> On 7/1/25 15:45, Bertrand Drouvot wrote:
>>
>> I took a quick look on this,
>
> Thanks for looking at it!
>
>> and I doubt we want to change the schema of
>> pg_buffercache like this. Adding columns is fine, but it seems rather
>> wrong to change the cardinality. The view is meant to be 1:1 mapping for
>> buffers, but now suddenly it's 1:1 with memory pages. Or rather (buffer,
>> page), to be precise.
>>
>> I think this will break a lot of monitoring queries, and possibly in a
>> very subtle way - especially on systems with huge pages, where most
>> buffers will have one row, but then a buffer that happens to be split on
>> two pages will have two rows. That seems not great.
>>
>> IMHO it'd be better to have a new view for this info, something like
>> pg_buffercache_pages, or something like that.
>
> That's a good point, fully agree!
>
>> But I'm also starting to question if the patch really is that useful.
>> Sure, people may not have NUMA support enabled (e.g. on non-linux
>> platforms), and even if they do the _numa view is quite expensive.
>>
>
> Yeah, it's not for day to day activities, more for configuration testing and
> also for development activity/testing.
>
> For example, If I set BLCKSZ to 8KB and enable huge pages (2MB), then I may
> expect to see buffers not spread across pages.
>
> But what I can see is:
>
> SELECT
> pages_per_buffer,
> COUNT(*) as buffer_count
> FROM (
> SELECT bufferid, COUNT(*) as pages_per_buffer
> FROM pg_buffercache
> GROUP BY bufferid
> ) subq
> GROUP BY pages_per_buffer
> ORDER BY pages_per_buffer;
>
> pages_per_buffer | buffer_count
> ------------------+--------------
> 1 | 261120
> 2 | 1024
>
> This is due to the shared buffers being aligned to PG_IO_ALIGN_SIZE.
>
> If I change it to:
>
> BufferManagerShmemInit(void)
>
> /* Align buffer pool on IO page size boundary. */
> BufferBlocks = (char *)
> - TYPEALIGN(PG_IO_ALIGN_SIZE,
> + TYPEALIGN(2 * 1024 * 1024,
> ShmemInitStruct("Buffer Blocks",
> - NBuffers * (Size) BLCKSZ + PG_IO_ALIGN_SIZE,
> + NBuffers * (Size) BLCKSZ + (2 * 1024 * 1024),
> &foundBufs));
>
> Then I get:
>
> pages_per_buffer | buffer_count
> ------------------+--------------
> 1 | 262144
> (1 row)
>
>
> So we've been able to see that some buffers were spread across pages due to
> shared buffer alignment on PG_IO_ALIGN_SIZE. And that if we change the alignment
> to be set to 2MB then I don't see any buffers spread across pages anymore.
>
> I think that it helps "visualize" some configuration or code changes.
>
> What are your thoughts?
>

But isn't the _numa view good enough for this? Sure, you need NUMA
support for it, and it may take a fair amount of time, but how often you
need to do such queries? I don't plan to block improving this use case,
but I'm not sure it's worth the effort.

cheers

--
Tomas Vondra

In response to

Re: Add os_page_num to pg_buffercache at 2025-07-01 16:34:56 from Bertrand Drouvot

Responses

Re: Add os_page_num to pg_buffercache at 2025-07-01 17:20:06 from Bertrand Drouvot

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2025-07-01 16:49:43	Re: wrong comments in rewriteTargetListIU
Previous Message	Tom Lane	2025-07-01 16:41:50	Re: GNU/Hurd portability patches