Re: index prefetching

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2025-08-13 21:37:32
Message-ID: adl4xbynhs72cuiqbebqkgmtyb6s3zdqhoazwjo63pjrdjupwy@mxcgidbduthw
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote:
> On 8/13/25 16:44, Andres Freund wrote:
> > On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
> >> In fact, I believe this is about io_method. I initially didn't see the
> >> difference you described, and then I realized I set io_method=sync to
> >> make it easier to track the block access. And if I change io_method to
> >> worker, I get different stats, that also change between runs.
> >>
> >> With "sync" I always get this (after a restart):
> >>
> >> Buffers: shared hit=7435 read=52801
> >>
> >> while with "worker" I get this:
> >>
> >> Buffers: shared hit=4879 read=52801
> >> Buffers: shared hit=5151 read=52801
> >> Buffers: shared hit=4978 read=52801
> >>
> >> So not only it changes run to tun, it also does not add up to 60236.
> >
> > This is reproducible on master? If so, how?
> >
> >
> >> I vaguely recall I ran into this some time ago during AIO benchmarking,
> >> and IIRC it's due to how StartReadBuffersImpl() may behave differently
> >> depending on I/O started earlier. It only calls PinBufferForBlock() in
> >> some cases, and PinBufferForBlock() is what updates the hits.
> >
> > Hm, I don't immediately see an issue there. The only case we don't call
> > PinBufferForBlock() is if we already have pinned the relevant buffer in a
> > prior call to StartReadBuffersImpl().
> >
> >
> > If this happens only with the prefetching patch applied, is is possible that
> > what happens here is that we occasionally re-request buffers that already in
> > the process of being read in? That would only happen with a read stream and
> > io_method != sync (since with sync we won't read ahead). If we have to start
> > reading in a buffer that's already undergoing IO we wait for the IO to
> > complete and count that access as a hit:
> >
> > /*
> > * Check if we can start IO on the first to-be-read buffer.
> > *
> > * If an I/O is already in progress in another backend, we want to wait
> > * for the outcome: either done, or something went wrong and we will
> > * retry.
> > */
> > if (!ReadBuffersCanStartIO(buffers[nblocks_done], false))
> > {
> > ...
> > /*
> > * Report and track this as a 'hit' for this backend, even though it
> > * must have started out as a miss in PinBufferForBlock(). The other
> > * backend will track this as a 'read'.
> > */
> > ...
> > if (persistence == RELPERSISTENCE_TEMP)
> > pgBufferUsage.local_blks_hit += 1;
> > else
> > pgBufferUsage.shared_blks_hit += 1;
> > ...
> >
> >
>
> I think it has to be this. It only happens with io_method != sync, and
> only with effective_io_concurrency > 1. At first I was wondering why I
> can't reproduce this for seqscan/bitmapscan, but then I realized those
> plans never visit the same block repeatedly - indexscans do that. It's
> also not surprising it's timing-sensitive, as it likely depends on how
> fast the worker happens to start/complete requests.
>
> What would be a good way to "prove" it really is this?

I'd just comment out those stats increments and then check if the stats are
stable afterwards.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2025-08-13 21:57:00 Re: index prefetching
Previous Message Peter Geoghegan 2025-08-13 21:36:24 Re: index prefetching