Re: index prefetching

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2025-08-14 21:55:53
Message-ID: CAH2-WzkgkvbN_GqR+pfE7uKwhWxQ6h4jst7Rpjgrt68Vc1=FDA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> If this same mechanism remembered (say) the last 2 heap blocks it
> requested, that might be enough to totally fix this particular
> problem. This isn't a serious proposal, but it'll be simple enough to
> implement. Hopefully when I do that (which I plan to soon) it'll fully
> validate your theory.

I spoke too soon. It isn't going to be so easy, since
heapam_index_fetch_tuple wants to consume buffers as a simple stream.
There's no way that index_scan_stream_read_next can just suppress
duplicate block number requests (in a way that's more sophisticated
than the current trivial approach that stores the very last block
number in IndexScanBatchState.lastBlock) without it breaking the whole
concept of a stream of buffers.

> > We can optimize that by deferring the StartBufferIO() if we're encountering a
> > buffer that is undergoing IO, at the cost of some complexity. I'm not sure
> > real-world queries will often encounter the pattern of the same block being
> > read in by a read stream multiple times in close proximity sufficiently often
> > to make that worth it.
>
> We definitely need to be prepared for duplicate prefetch requests in
> the context of index scans.

Can you (or anybody else) think of a quick and dirty way of working
around the problem on the read stream side? I would like to prioritize
getting the patch into a state where its overall performance profile
"feels right". From there we can iterate on fixing the underlying
issues in more principled ways.

FWIW it wouldn't be that hard to require the callback (in our case
index_scan_stream_read_next) to explicitly point out that it knows
that the block number it's requesting has to be a duplicate. It might
make sense to at least place that much of the burden on the
callback/client side.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-08-14 22:11:28 Re: Support getrandom() for pg_strong_random() source
Previous Message Peter Geoghegan 2025-08-14 21:06:07 Re: index prefetching