Re: index prefetching

From: Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2026-02-16 05:30:00
Message-ID: CAE8JnxNOV9kOgmU1-WUWts9Q-Jj_Nf0K480wyEwJXUQYMnYu3g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for your comments Tomas and Andres,

I will try to be brief.

## Large table MacOS test

I did a 1000x larger test in MacOS, with the sql script . prefetch had
negligible effect for random and periodic, and made sequential 16% slower
[a].

## Small scale linux test.

I have repeated the tests with a 10k table row on a linux system, this time
I tried using either a SSD or a HDD, with shared_buffers of either 128kB or
128MB, and either psql or python with psycopg. I used a slightly different
table [b].

Prefetch makes...

SDD with low available buffers is significantly slower, except for
sequential reads.
SDD with cold OS reads is significantly faster for random access.
HDD faster, but not reliably (high variance).

I am not questioning the usefulness of the patch, and I know that there is
a log of work already put into it. The reason why I decided to review it is
because I believe this can be one important step forward. But I hope that
it is not hard to agree that these (counter)examples suggest that there are
some edges to be pruned. Where I work, most of the queries will access
at most a few hundred lines and are expected to be perceived as
instantaneous.

If the tests I am doing are pointless, should we consider having something
in the planner to prevent these scans from using prefetch?

Should we introduce centralized coordination for IO? As far as I know this
is an area where we just let each query request what they need and hope for
the best. What happens if we have two sequential scans in different tables?
the disk access could interleave pages of the two scans, falling into a
random access pattern, right?

### Cache control

This is a way to make the script run without sudo in linux, you give
ownership to root, and then you pin this program to the owner
% gcc drop_cache.c -o drop_cache;
% sudo chown root:root drop_cache;
% sudo chmod 4755 drop_cache;

In MacOS purge in the sudoers[2] temporarily, similar to [3]
user ALL=(ALL) NOPASSWD: /usr/bin/purge
So that I don't need to give sudo privileges to the script (that imports
code that I am not even aware of).

Notes:

[a] I did some profiling with sample [4], and tried to spot functions with
the highest increase or decrease in run time, but I was too confused, no
point in dumping raw logs here.

[b] This time I used a (SELECT string_agg((i*j)::text, '+') FROM
generate_series(1, 50)) instead of repeat('x', 100), just to prevent it
from compressing to nothing when I try larger payloads, and hit the TOAST
thresholds. I removed the primary key `id` because it was annoying to take
20 minutes to insert the data in the large scale test.

References:

[1]
https://www.reddit.com/r/sysadmin/comments/pwiboy/whats_the_purpose_of_chmod_4755_binsh_command/
[2] https://man7.org/linux/man-pages/man5/sudoers.5.html
[3] https://unix.stackexchange.com/a/13058
[4] https://www.unix.com/man_page/osx/1/sample/

Regards,
Alexandre

Attachment Content-Type Size
image/png 53.6 KB
image/png 60.2 KB
image/png 58.7 KB
hdd-linux-psycopg-128kB.png image/png 85.7 KB
hdd-linux-psycopg-128MB.png image/png 87.2 KB
sdd-linux-psycopg-128kB.png image/png 87.4 KB
image/png 57.6 KB
sdd-linux-psycopg-128MB.png image/png 83.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-02-16 05:35:41 Re: pgstat include expansion
Previous Message Fujii Masao 2026-02-16 05:11:47 Re: recovery.signal not cleaned up when both signal files are present