| From: | Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tomas(at)vondra(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: index prefetching |
| Date: | 2026-02-27 08:51:19 |
| Message-ID: | CAE8JnxOn4+xUAnce+M7LfZWOqfrMMxasMaEmSKwiKbQtZr65uA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Feb 27, 2026 at 4:18 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> I'm planning to do some reviewing in the next days. In preparation I just
> retried a benchmark and saw some odd results. After a while I was able to
> reproduce even with a simpler setup:
>
> I'm planning to do some reviewing in the next days. In preparation I just
> retried a benchmark and saw some odd results.
Since we are talking about results I will share mine too :)
The bottomline is: Prefetch is working, but it might make some things
slower.
It is obvious that this should better exploit IO for one single heavy
query, in
one single table.
It is not so obvious, to me, how this would behave when there are multiple
concurrent queries. It is not so obvious how this will impact when multiple
tables are queried at the same time. My feeling is that it should greatly
improve on a disk with a mechanical head, if it performs the same reads
reducing the number of times it has to jump from one to another. Is there
much
interest in special optimisations for those or is the focus more on SSDs?
On my previous review I wasted way to much time trying to improve
read_stream,
to end up getting just some mixed results. This time I tried to step back
and
try to look at various functions that could have changed. Initially I tried
compiler function instrumentation, but then the profiling overhead of 33k
functions dominated.
This time what I did (1) added a indexscan_prefetch_distance, maybe a better
name would be just prefetch_distance, it limits the growth of distance in
read_stream (distance-limit.diff). (2) captured execution statistics for 15
functions (profiling-instrumentation.diff). At exit each process will
create a
log with its configuration and call statistics.
The benchmark was with full index scan on a sequential column, executed
repeatedly and no cache eviction: buffer hit path.
BENCHMARK RESULTS
MacOS in normal (for me) use
Prefetch Avg Time Min Time Max Time
------------------------------------------------
off 6.03s 5.12s 11.70s
1 59.44s 25.33s 257.60s
4 19.74s 12.66s 44.36s
16 11.87s 7.49s 19.13s
64 8.77s 6.05s 13.97s
128 6.40s 4.33s 11.74s
MacOS idle, after reboot
Prefetch Avg Time Min Time Max Time
------------------------------------------------
off 2.17s 2.12s 2.26s
1 5.53s 5.44s 5.57s
4 3.17s 3.04s 3.39s
16 3.13s 3.04s 3.29s
64 2.82s 2.66s 2.88s
128 2.83s 2.69s 2.90s
Docker on MacOS, idle, after reboot
Prefetch Avg Time Min Time Max Time
------------------------------------------------
off 1.38s 1.36s 1.46s
1 3.65s 3.56s 3.70s
4 2.00s 1.98s 2.09s
16 1.56s 1.53s 1.59s
64 1.29s 1.25s 1.33s
128 1.28s 1.26s 1.32s
Docker on Linux
Prefetch Avg Time Min Time Max Time
------------------------------------------------
off 6.07s 5.92s 6.29s
1 6.85s 6.67s 7.04s
4 6.26s 6.10s 6.41s
16 6.14s 5.95s 6.30s
64 5.74s 5.62s 5.91s
128 5.72s 5.63s 5.86s
The linux execution presented very little degradation. On MacOS host the
degradation was more noticeable than on MacOS docker running a debian,
suggesting that software ecosystem contributes, docker on MacOS (arm), was
slower than docker on a native linux (x86_64), here I could be it is CPU
architecture or OS kernel differences.
WHAT CHANGED
The benchmark will produced, 195 autovac_worker, and 3293 backend and one
bgworker log. For prefetch off the number of calls is constant. For
prefetch on
they vary widely, but I am looking at the total time per function, assuming
that the differences in the number of calls changes only how the work was
partitioned but the final work was the same.
With Docker version 28.3.0, build 38b7060, Python 3.10.18
$ docker compose up --build benchmark
$ docker cp docker-postgres-1:/tmp/profiling ./docker-profiling
$ python compare_profiles.py docker-profiling
Function off,d=0 on,d=128 Diff % z-statistic
------------------------------------------------------------------------
read_stream_next_buffer 0.0 3944.9 +3944.9 NEW% +654.88
read_stream_look_ahead 0.0 2999.3 +2999.3 NEW% +624.00
WaitReadBuffers 98.3 754.6 +656.3 +667.7% +414.56
_bt_next 748.7 1072.8 +324.1 +43.3% +20.35
heapam_batch_getnext 788.4 1114.6 +326.2 +41.4% +20.18
btgetbatch 777.0 1096.7 +319.7 +41.2% +20.14
IndexNext 17031.7 10400.3 -6631.5 -38.9% -249.51
_bt_first 17.2 13.0 -4.2 -24.6% 10.56
Function off,d=0 on,d=1 Diff % z-statistic
------------------------------------------------------------------------
read_stream_look_ahead 0.0 28135.9 +28135.9 NEW N/A
read_stream_next_buffer 0.0 199245.7 +199245.7 NEW N/A
IndexNext 17031.7 211861.0 +194829.2 12.4x +283.00
WaitReadBuffers 98.3 169641.9 +169543.6 1724x +275.63
heapam_index_fetch_tuple 13564.5 205828.2 +192263.7 15x +172.56
heapam_batch_getnext 788.4 1944.0 +1155.6 +146.6% +25.39
_bt_next 748.7 1833.9 +1085.2 +144.9% +24.85
btgetbatch 777.0 1881.2 +1104.2 +142.1% +24.74
_bt_first 17.2 19.3 +2.1 +12.0% +10.33
PS.: The docker environment cache eviction requires adjustments.
| Attachment | Content-Type | Size |
|---|---|---|
| v11-docker-instrumentation.zip | application/zip | 26.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-02-27 09:43:20 | launch_backend: Remove duplicate include under EXEC_BACKEND |
| Previous Message | Andreas Karlsson | 2026-02-27 08:49:13 | Re: Partial Mode in Aggregate Functions |