Re: index prefetching

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2026-02-22 19:28:00
Message-ID: CAH2-WzkfGN3EBqdiLt=aAGJ36a1dD2s4HNKzYViXaEv9pQ-z1g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 22, 2026 at 11:23 AM Alexandre Felipe
<o(dot)alexandre(dot)felipe(at)gmail(dot)com> wrote:
> DISTANCE CONTROL
>
> I tested different strategies to increase distance. 2*d, 2*d+1, d+2, d+4, and so on. In my head, what would make sense is d + io_combine_limi, but in the end the 2*d gives the best results across different patterns, e.g. (h{200}m{200}) that seems to be a more reasonable pattern, as previous scans would have loaded in blocks. But these are fundamentally the same, as I posted about this a markov model, and the limit will be something like max_distance * sigmoid(h * (p - p0)), what changes is the transient when we go in and out of a cached region.

I don't understand. Why, in general, would a Markov model be useful
for determining prefetch distance?

> LIMITING PREFETCH
>
> To avoid prefetch waste with a limit node wouldn't it make sense to send from the executor an estimate of how many rows will be required.

There's a patch that does that. Have you looked at the patch series at all?

> I/O REORDERING
>
> I did an experiment reordering the heap accesses, following a zig-zag pattern

There's no question that reordering heap accesses is an interesting
direction to eventually take this infrastructure in. I've experimented
with that myself. But this is the worst possible time to be increasing
the scope of the patch for an uncertain benefit.

We're in crunch mode right now, ahead of feature freeze, which is less
than 6 weeks away. Tomas has been working on this project for about 3
years, and I've been working on it for about 1. Long digressions about
the asymptotic complexity of priority queues add less than zero value.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yasir 2026-02-22 21:28:25 Re: Regression failures after changing PostgreSQL blocksize
Previous Message Anthony Hsu 2026-02-22 19:20:50 Re: Set 1s WaitLatch timeout if standby limit has expired in ResolveRecoveryConflictWithBufferPin