| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: index prefetching |
| Date: | 2025-12-28 18:08:12 |
| Message-ID: | c2357c01-bf15-44d6-a4cb-591675457032@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 12/25/25 16:39, Konstantin Knizhnik wrote:
>
> On 21/12/2025 7:55 PM, Peter Geoghegan wrote:
>> On Wed, Dec 10, 2025 at 9:21 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>> Attached is v4.
>> Attached is v5. Changes from v4:
>>
>> * Simplified and optimized index-only scans, with a particular
>> emphasis on avoiding regressions with nested loop joins with an inner
>> index-only scan.
>>
>> There were quite a number of small problems/dead code related to
>> index-only scans fixed by this new v5. Overall, I'm quite a bit
>> happier with the state of index-only scans, which I'd not paid too
>> much attention to before now.
>>
>> * Added Valgrind instrumentation to the hash index patch, which was
>> required to fix some false positives.
>>
>> The generic indexam_util_batch_unlock routine had Valgrind
>> instrumentation in earlier versions, just to keep nbtree's buffer
>> locking checks from generating similar false positives. Some time
>> later, when I added the hashgetbatch patch, there were new Valgrind
>> false positives during hash index scans -- which I missed at first.
>> This new v5 revisions adds similar Valgrind checks to hash itself
>> (changes that add code that is more or less a direct port of the stuff
>> added to nbtree by commit 4a70f829), which fixes the false positives,
>> and is independently useful.
>>
>> The rule for amgetbatch-based index AMs is that they must have similar
>> buffer locking instrumentation. That seems like a good thing.
>>
>> --
>> Peter Geoghegan
>
> I the previous mail I shared results of my experiments with different
> prefetch distance.
> I think that we should start prefetching of heap tuples not from the
> second batch, but after some number of proceeded tids.
>
> Attached please find a patch which implements this approach.
> And below are updated results:
>
> limit\prefetch on off always inc threshold
> 1 12074 12765 3146 3282 12394
> 2 5912 6198 2463 2438 6124
> 4 2919 3047 1334 1964 2910
> 8 1554 1496 1166 1409 1588
> 16 815 775 947 940 600
> 32 424 403 687 695 478
> 64 223 208 446 453 358
> 128 115 106 258 270 232
> 256 68 53 138 149 131
> 512 43 27 72 78 71
> 1024 28 13 38 40 38
>
> Last column is result of prefetch with read_stream_threshold=10.
>
That's great, but it only works for cases that can (and do) benefit from
the prefetching. Try running the benchmark with a data set that fits
into shared buffers (or RAM), which makes prefetching useless.
I tried that with your test, comparing master, v5 and v5 + your
read_stream_threshold patch. See the attached run.sh script, and the PDF
summarizing the results. The last two column groups are comparisons to
master, with green=improvement, red=regression. There are no actual
improvements (1% delta is just noise). But the read_stream_threshold
results have a clear pattern of pretty massive (20-30%) regressions.
The difference between v5 and v5-threshold is pretty clear.
IIRC cases like this are *exactly* why we ended up with the current
heuristics, enabling prefetching only from the second batch. This
removes the risk of expensive read_stream init for very fast queries
that don't benefit anything. Of course, prefetching may be useless for
later batches too (e.g. if all the data is cached), but the query will
be expensive enough for the read_stream init cost to be negligible.
To put this differently, the more aggressive the heuristics is (enabling
prefetching in more case), the more likely it's to cause regressions.
We've chosen to be more defensive, i.e. to sacrifice some possible gains
in order to not regress plausible workloads. I hope we agree queries on
fully cached "hot" data are pretty common / important.
We can probably do better in the future. But we'll never know for sure
if a given scan benefits from prefetching. It's not just about the
number of items in the batch, but also about how many heap pages that
translates to, what I/O pattern (random vs. sequential?), how many are
already cached. For some queries we don't even know how many items we'll
actually need. We can't check all that at the very beginning, because
it's simply prohibitively expensive.
regards
--
Tomas Vondra
| Attachment | Content-Type | Size |
|---|---|---|
| results-prefetch-v5.pdf | application/pdf | 98.6 KB |
| run.sh | application/x-shellscript | 2.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2025-12-28 18:18:30 | Re: index prefetching |
| Previous Message | Zsolt Parragi | 2025-12-28 17:55:34 | Re: RFC: PostgreSQL Storage I/O Transformation Hooks |