From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2025-07-24 23:52:22 |
Message-ID: | 306fc8c0-c882-4602-86f5-a106b9ace603@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 7/24/25 16:40, Peter Geoghegan wrote:
> On Thu, Jul 24, 2025 at 7:19 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> I got a bit bored yesterday, so I gave this a try and whipped up a patch
>> that adds two pgstattuple functins that I think could be useful for
>> analyzing index metrics that matter for prefetching.
>
> This seems quite useful.
>
> I notice that you're not accounting for posting lists. That'll lead to
> miscounts of the number of heap blocks in many cases. I think that
> that's worth fixing, even given that this patch is experimental.
>
Yeah, I forgot about that. Should be fixed in the v2. Admittedly I don't
know that much about nbtree internals, so this is mostly copy pasting
from verify_nbtree.
>> It's trivial to summarize this into a per-index statistic (of course,
>> there may be some inaccuracies when the run spans multiple ranges), but
>> it also seems useful to be able to look at parts of the index.
>
> FWIW in my experience, the per-leaf-page "nhtids:nhblks" tends to be
> fairly consistent across all leaf pages from a given index. There are
> no doubt some exceptions, but they're probably pretty rare.
>
Yeah, probably. And we'll probably test on such uniform data sets, or at
least we we'll start with those. But at some point I'd like to test with
some of these "weird" indexes too, if only to test how well the prefetch
heuristics adjusts the distance.
>> Second, the index is walked sequentially in physical order, from block 0
>> to the last block. But that's not really what the index prefetch sees.
>> To make it "more accurate" it'd be better to just scan the leaf pages as
>> if during a "full index scan".
>
> Why not just do it that way to begin with? It wouldn't be complicated
> to make the function follow a chain of right sibling links.
>
I have a very good reason why I didn't do it that way. I was lazy. But
v2 should be doing that, I think.
> I suggest an interface that takes a block number, and an nblocks int8
> argument that must be >= 1. The function would start from the block
> number arg leaf page. If it's not a non-ignorable leaf page, throw an
> error. Otherwise, count the number of distinct heap blocks on the leaf
> page, and count the number of heap blocks on each additional leaf page
> to the right -- until we've counted the heap blocks from nblocks-many
> leaf pages (or until we reach the rightmost leaf page).
>
Yeah, this interface seems useful. I suppose it'll be handy when looking
at an index scan, to get stats from the currently loaded batches. In
principle you get that from v3 by filtering, but it might be slow on
large indexes. I'll try doing that in v3.
> I suggest that a P_IGNORE() page shouldn't have its heap blocks
> counted, and shouldn't count towards our nblocks tally of leaf pages
> whose heap blocks are to be counted. Upon encountering a P_IGNORE()
> page, just move to the right without doing anything. Note that the
> rightmost page cannot be P_IGNORE().
>
I think v2 does all of this.
> This scheme will always succeed, no matter the nblocks argument,
> provided the initial leaf page is a valid leaf page (and provided the
> nblocks arg is >= 1).
>
> I get that this is just a prototype that might not go anywhere, but
> the scheme I've described requires few changes.
>
Yep, thanks.
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
v2-0001-pgstattuple-analyze-TIDs-on-btree-leaf-pages.patch | text/x-patch | 16.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-07-25 00:07:58 | Re: Regression with large XML data input |
Previous Message | Jim Jones | 2025-07-24 23:25:48 | Re: Regression with large XML data input |