From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add os_page_num to pg_buffercache |
Date: | 2025-07-01 16:45:37 |
Message-ID: | d96c343b-959a-45ae-b0d8-e728bb1d0ed2@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 7/1/25 18:34, Bertrand Drouvot wrote:
> Hi,
>
> On Tue, Jul 01, 2025 at 04:31:01PM +0200, Tomas Vondra wrote:
>> On 7/1/25 15:45, Bertrand Drouvot wrote:
>>
>> I took a quick look on this,
>
> Thanks for looking at it!
>
>> and I doubt we want to change the schema of
>> pg_buffercache like this. Adding columns is fine, but it seems rather
>> wrong to change the cardinality. The view is meant to be 1:1 mapping for
>> buffers, but now suddenly it's 1:1 with memory pages. Or rather (buffer,
>> page), to be precise.
>>
>> I think this will break a lot of monitoring queries, and possibly in a
>> very subtle way - especially on systems with huge pages, where most
>> buffers will have one row, but then a buffer that happens to be split on
>> two pages will have two rows. That seems not great.
>>
>> IMHO it'd be better to have a new view for this info, something like
>> pg_buffercache_pages, or something like that.
>
> That's a good point, fully agree!
>
>> But I'm also starting to question if the patch really is that useful.
>> Sure, people may not have NUMA support enabled (e.g. on non-linux
>> platforms), and even if they do the _numa view is quite expensive.
>>
>
> Yeah, it's not for day to day activities, more for configuration testing and
> also for development activity/testing.
>
> For example, If I set BLCKSZ to 8KB and enable huge pages (2MB), then I may
> expect to see buffers not spread across pages.
>
> But what I can see is:
>
> SELECT
> pages_per_buffer,
> COUNT(*) as buffer_count
> FROM (
> SELECT bufferid, COUNT(*) as pages_per_buffer
> FROM pg_buffercache
> GROUP BY bufferid
> ) subq
> GROUP BY pages_per_buffer
> ORDER BY pages_per_buffer;
>
> pages_per_buffer | buffer_count
> ------------------+--------------
> 1 | 261120
> 2 | 1024
>
> This is due to the shared buffers being aligned to PG_IO_ALIGN_SIZE.
>
> If I change it to:
>
> BufferManagerShmemInit(void)
>
> /* Align buffer pool on IO page size boundary. */
> BufferBlocks = (char *)
> - TYPEALIGN(PG_IO_ALIGN_SIZE,
> + TYPEALIGN(2 * 1024 * 1024,
> ShmemInitStruct("Buffer Blocks",
> - NBuffers * (Size) BLCKSZ + PG_IO_ALIGN_SIZE,
> + NBuffers * (Size) BLCKSZ + (2 * 1024 * 1024),
> &foundBufs));
>
> Then I get:
>
> pages_per_buffer | buffer_count
> ------------------+--------------
> 1 | 262144
> (1 row)
>
>
> So we've been able to see that some buffers were spread across pages due to
> shared buffer alignment on PG_IO_ALIGN_SIZE. And that if we change the alignment
> to be set to 2MB then I don't see any buffers spread across pages anymore.
>
> I think that it helps "visualize" some configuration or code changes.
>
> What are your thoughts?
>
But isn't the _numa view good enough for this? Sure, you need NUMA
support for it, and it may take a fair amount of time, but how often you
need to do such queries? I don't plan to block improving this use case,
but I'm not sure it's worth the effort.
cheers
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2025-07-01 16:49:43 | Re: wrong comments in rewriteTargetListIU |
Previous Message | Tom Lane | 2025-07-01 16:41:50 | Re: GNU/Hurd portability patches |