Re: Add os_page_num to pg_buffercache

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add os_page_num to pg_buffercache
Date: 2025-07-01 16:34:56
Message-ID: aGQOMPEENZc/2fJm@ip-10-97-1-34.eu-west-3.compute.internal
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Jul 01, 2025 at 04:31:01PM +0200, Tomas Vondra wrote:
> On 7/1/25 15:45, Bertrand Drouvot wrote:
>
> I took a quick look on this,

Thanks for looking at it!

> and I doubt we want to change the schema of
> pg_buffercache like this. Adding columns is fine, but it seems rather
> wrong to change the cardinality. The view is meant to be 1:1 mapping for
> buffers, but now suddenly it's 1:1 with memory pages. Or rather (buffer,
> page), to be precise.
>
> I think this will break a lot of monitoring queries, and possibly in a
> very subtle way - especially on systems with huge pages, where most
> buffers will have one row, but then a buffer that happens to be split on
> two pages will have two rows. That seems not great.
>
> IMHO it'd be better to have a new view for this info, something like
> pg_buffercache_pages, or something like that.

That's a good point, fully agree!

> But I'm also starting to question if the patch really is that useful.
> Sure, people may not have NUMA support enabled (e.g. on non-linux
> platforms), and even if they do the _numa view is quite expensive.
>

Yeah, it's not for day to day activities, more for configuration testing and
also for development activity/testing.

For example, If I set BLCKSZ to 8KB and enable huge pages (2MB), then I may
expect to see buffers not spread across pages.

But what I can see is:

SELECT
pages_per_buffer,
COUNT(*) as buffer_count
FROM (
SELECT bufferid, COUNT(*) as pages_per_buffer
FROM pg_buffercache
GROUP BY bufferid
) subq
GROUP BY pages_per_buffer
ORDER BY pages_per_buffer;

pages_per_buffer | buffer_count
------------------+--------------
1 | 261120
2 | 1024

This is due to the shared buffers being aligned to PG_IO_ALIGN_SIZE.

If I change it to:

BufferManagerShmemInit(void)

/* Align buffer pool on IO page size boundary. */
BufferBlocks = (char *)
- TYPEALIGN(PG_IO_ALIGN_SIZE,
+ TYPEALIGN(2 * 1024 * 1024,
ShmemInitStruct("Buffer Blocks",
- NBuffers * (Size) BLCKSZ + PG_IO_ALIGN_SIZE,
+ NBuffers * (Size) BLCKSZ + (2 * 1024 * 1024),
&foundBufs));

Then I get:

pages_per_buffer | buffer_count
------------------+--------------
1 | 262144
(1 row)

So we've been able to see that some buffers were spread across pages due to
shared buffer alignment on PG_IO_ALIGN_SIZE. And that if we change the alignment
to be set to 2MB then I don't see any buffers spread across pages anymore.

I think that it helps "visualize" some configuration or code changes.

What are your thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2025-07-01 16:37:05 Re: Making Row Comparison NULL row member handling more robust during skip scans
Previous Message Tom Lane 2025-07-01 15:42:28 Re: No error checking when reading from file using zstd in pg_dump