Re: index prefetching

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2025-08-13 14:44:33
Message-ID: c7a77pcyc5eynme376wvyojryijtlieyxsu3bvxp4eiy6au6uf@caniulyi4jr5
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
> In fact, I believe this is about io_method. I initially didn't see the
> difference you described, and then I realized I set io_method=sync to
> make it easier to track the block access. And if I change io_method to
> worker, I get different stats, that also change between runs.
>
> With "sync" I always get this (after a restart):
>
> Buffers: shared hit=7435 read=52801
>
> while with "worker" I get this:
>
> Buffers: shared hit=4879 read=52801
> Buffers: shared hit=5151 read=52801
> Buffers: shared hit=4978 read=52801
>
> So not only it changes run to tun, it also does not add up to 60236.

This is reproducible on master? If so, how?

> I vaguely recall I ran into this some time ago during AIO benchmarking,
> and IIRC it's due to how StartReadBuffersImpl() may behave differently
> depending on I/O started earlier. It only calls PinBufferForBlock() in
> some cases, and PinBufferForBlock() is what updates the hits.

Hm, I don't immediately see an issue there. The only case we don't call
PinBufferForBlock() is if we already have pinned the relevant buffer in a
prior call to StartReadBuffersImpl().

If this happens only with the prefetching patch applied, is is possible that
what happens here is that we occasionally re-request buffers that already in
the process of being read in? That would only happen with a read stream and
io_method != sync (since with sync we won't read ahead). If we have to start
reading in a buffer that's already undergoing IO we wait for the IO to
complete and count that access as a hit:

/*
* Check if we can start IO on the first to-be-read buffer.
*
* If an I/O is already in progress in another backend, we want to wait
* for the outcome: either done, or something went wrong and we will
* retry.
*/
if (!ReadBuffersCanStartIO(buffers[nblocks_done], false))
{
...
/*
* Report and track this as a 'hit' for this backend, even though it
* must have started out as a miss in PinBufferForBlock(). The other
* backend will track this as a 'read'.
*/
...
if (persistence == RELPERSISTENCE_TEMP)
pgBufferUsage.local_blks_hit += 1;
else
pgBufferUsage.shared_blks_hit += 1;
...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Malakhov 2025-08-13 15:01:27 Detoast iterators - take 2
Previous Message Tom Lane 2025-08-13 14:39:50 Re: cfbot mistakenly reports that a rebase is needed