From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add os_page_num to pg_buffercache |
Date: | 2025-07-01 16:34:56 |
Message-ID: | aGQOMPEENZc/2fJm@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Tue, Jul 01, 2025 at 04:31:01PM +0200, Tomas Vondra wrote:
> On 7/1/25 15:45, Bertrand Drouvot wrote:
>
> I took a quick look on this,
Thanks for looking at it!
> and I doubt we want to change the schema of
> pg_buffercache like this. Adding columns is fine, but it seems rather
> wrong to change the cardinality. The view is meant to be 1:1 mapping for
> buffers, but now suddenly it's 1:1 with memory pages. Or rather (buffer,
> page), to be precise.
>
> I think this will break a lot of monitoring queries, and possibly in a
> very subtle way - especially on systems with huge pages, where most
> buffers will have one row, but then a buffer that happens to be split on
> two pages will have two rows. That seems not great.
>
> IMHO it'd be better to have a new view for this info, something like
> pg_buffercache_pages, or something like that.
That's a good point, fully agree!
> But I'm also starting to question if the patch really is that useful.
> Sure, people may not have NUMA support enabled (e.g. on non-linux
> platforms), and even if they do the _numa view is quite expensive.
>
Yeah, it's not for day to day activities, more for configuration testing and
also for development activity/testing.
For example, If I set BLCKSZ to 8KB and enable huge pages (2MB), then I may
expect to see buffers not spread across pages.
But what I can see is:
SELECT
pages_per_buffer,
COUNT(*) as buffer_count
FROM (
SELECT bufferid, COUNT(*) as pages_per_buffer
FROM pg_buffercache
GROUP BY bufferid
) subq
GROUP BY pages_per_buffer
ORDER BY pages_per_buffer;
pages_per_buffer | buffer_count
------------------+--------------
1 | 261120
2 | 1024
This is due to the shared buffers being aligned to PG_IO_ALIGN_SIZE.
If I change it to:
BufferManagerShmemInit(void)
/* Align buffer pool on IO page size boundary. */
BufferBlocks = (char *)
- TYPEALIGN(PG_IO_ALIGN_SIZE,
+ TYPEALIGN(2 * 1024 * 1024,
ShmemInitStruct("Buffer Blocks",
- NBuffers * (Size) BLCKSZ + PG_IO_ALIGN_SIZE,
+ NBuffers * (Size) BLCKSZ + (2 * 1024 * 1024),
&foundBufs));
Then I get:
pages_per_buffer | buffer_count
------------------+--------------
1 | 262144
(1 row)
So we've been able to see that some buffers were spread across pages due to
shared buffer alignment on PG_IO_ALIGN_SIZE. And that if we change the alignment
to be set to 2MB then I don't see any buffers spread across pages anymore.
I think that it helps "visualize" some configuration or code changes.
What are your thoughts?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2025-07-01 16:37:05 | Re: Making Row Comparison NULL row member handling more robust during skip scans |
Previous Message | Tom Lane | 2025-07-01 15:42:28 | Re: No error checking when reading from file using zstd in pg_dump |