Re: index prefetching

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2025-09-04 00:16:06
Message-ID: v2kby5im7fwhbzakiybydruu652cagr6723dporckccejashh4@ngbzy45pcv6f
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-09-03 16:25:56 -0400, Peter Geoghegan wrote:
> On Wed, Sep 3, 2025 at 4:06 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > The issue to me is that this kind of query actually *can* substantially
> > benefit from prefetching, no?
>
> As far as I can tell, not really, no.

It seems to here - I see small wins even with kernel readahead, fwiw.

> > Afaict the performance without prefetching is
> > rather atrocious as soon as a) storage has a tad higher latency or b) DIO is
> > used.
>
> I don't know that storage latency matters, when (without DIO) we're
> doing so well from readahead.

The readahead linux does actually is not aggressive enough once you have
higher IO latency - you can tune it up, but then it often does too much IO.

> > Indeed: With DIO, readahead provides a ~2.6x improvement for the query at hand.
>
> I don't see that level of improvement with DIO. For me it's 6054.921
> ms with prefetching, 8766.287 ms without it.

I guess your SSD has lower latency than mine...

> I can kind of accept the idea that in some sense readahead shouldn't
> count too much, since the future is DIO. But it's not like aggressive
> prefetching matches the performance of buffered I/O + readahead. Not
> for me, at any rate. I don't know why.

It does here, just about. The reason for not matching is fairly simple: The
kernel readahead issues large reads, but with DIO we don't for this query. The
adversarial pattern here rarely has two consecutive neighboring blocks, so
nearly all reads are 8kB reads.

This actually might be the thing to tackle to avoid this and other similar
regressions: If we were able to isssue combined IOs for interspersed patterns
like we have in this query, we'd easily win back the overhead. And it'd make
DIO much much better.

We don't want to do try to find more complicated merges for things like
seqscans and bitmap heap scans, there never can be anything other than merges
of consecutive blocks, and the CPU overhead of the more complicated search
would likely be noticeable. But for something like index scans that's
different.

I don't quite know if this is best done as an optional feature for read
streams, a layer atop read stream or something dedicated.

For now I'll go back to working on read stream test infrastructure. That's the
prerequisite for testing the "don't synchronously wait for in-progress IO"
improvement. And if we want to have more complicated merging, that also seems
like something much easier to develop with some testing infra.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-09-04 00:17:45 Re: Refactoring: Use soft error reporting for *_opt_error functions
Previous Message Rishu Bagga 2025-09-03 23:51:20 Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue