From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2025-07-24 11:19:07 |
Message-ID: | 38b865bd-2ae9-4a94-a788-6e3dc99ccd70@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 7/23/25 02:37, Tomas Vondra wrote:
> ...
>
>>> Thanks. I wonder how difficult would it be to add something like this to
>>> pgstattuple. I mean, it shouldn't be difficult to look at leaf pages and
>>> count distinct blocks, right? Seems quite useful.
>>
>> I agree that that would be quite useful.
>>
>
> Good first patch for someone ;-)
>
I got a bit bored yesterday, so I gave this a try and whipped up a patch
that adds two pgstattuple functins that I think could be useful for
analyzing index metrics that matter for prefetching.
The patch adds two functions, that are meant to provide data for
additional analysis rather than computing something final. Each function
splits the index into a sequence of block ranges (of given length), and
calculates some metrics on that.
pgstatindex_nheap
- number of leafs in the range
- number of block numbers
- number of distinct block numbers
- number of runs (of the same block)
pgstatindex_runs
- number of leafs in the range
- run length
- number of runs with the length
It's trivial to summarize this into a per-index statistic (of course,
there may be some inaccuracies when the run spans multiple ranges), but
it also seems useful to be able to look at parts of the index.
This is meant as a quick experimental patch, to help with generating
better datasets for the evaluation. And I think it works for that, and I
don't have immediate plans to work on this outside that context.
There are a couple things we'd need to address before actually merging
this, I think. Two that I can think of right now:
First, the "range length" determines memory usage. Right now it's a bit
naive, and just extracts all blocks (for the range) into an array. That
might be an issue for larger ranges, I'm sure there are strategies to
mitigate that - doing some of the processing when reading block numbers,
using hyperloglog to estimate distincts, etc.
Second, the index is walked sequentially in physical order, from block 0
to the last block. But that's not really what the index prefetch sees.
To make it "more accurate" it'd be better to just scan the leaf pages as
if during a "full index scan".
Also, I haven't updated the docs. That'd also need to be done.
regards
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
v1-0001-pgstattuple-analyze-TIDs-on-btree-leaf-pages.patch | text/x-patch | 15.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2025-07-24 11:47:23 | Re: Adding wait events statistics |
Previous Message | Hayato Kuroda (Fujitsu) | 2025-07-24 11:19:02 | RE: recoveryStopsAfter is not usable when recovery_target_inclusive is false |