Re: index prefetching

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2025-08-13 22:23:49
Message-ID: 80de9927-539a-448f-a299-013edaede283@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/13/25 23:37, Andres Freund wrote:
> Hi,
>
> On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote:
>> On 8/13/25 16:44, Andres Freund wrote:
>>> On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
>>>> In fact, I believe this is about io_method. I initially didn't see the
>>>> difference you described, and then I realized I set io_method=sync to
>>>> make it easier to track the block access. And if I change io_method to
>>>> worker, I get different stats, that also change between runs.
>>>>
>>>> With "sync" I always get this (after a restart):
>>>>
>>>> Buffers: shared hit=7435 read=52801
>>>>
>>>> while with "worker" I get this:
>>>>
>>>> Buffers: shared hit=4879 read=52801
>>>> Buffers: shared hit=5151 read=52801
>>>> Buffers: shared hit=4978 read=52801
>>>>
>>>> So not only it changes run to tun, it also does not add up to 60236.
>>>
>>> This is reproducible on master? If so, how?
>>>
>>>
>>>> I vaguely recall I ran into this some time ago during AIO benchmarking,
>>>> and IIRC it's due to how StartReadBuffersImpl() may behave differently
>>>> depending on I/O started earlier. It only calls PinBufferForBlock() in
>>>> some cases, and PinBufferForBlock() is what updates the hits.
>>>
>>> Hm, I don't immediately see an issue there. The only case we don't call
>>> PinBufferForBlock() is if we already have pinned the relevant buffer in a
>>> prior call to StartReadBuffersImpl().
>>>
>>>
>>> If this happens only with the prefetching patch applied, is is possible that
>>> what happens here is that we occasionally re-request buffers that already in
>>> the process of being read in? That would only happen with a read stream and
>>> io_method != sync (since with sync we won't read ahead). If we have to start
>>> reading in a buffer that's already undergoing IO we wait for the IO to
>>> complete and count that access as a hit:
>>>
>>> /*
>>> * Check if we can start IO on the first to-be-read buffer.
>>> *
>>> * If an I/O is already in progress in another backend, we want to wait
>>> * for the outcome: either done, or something went wrong and we will
>>> * retry.
>>> */
>>> if (!ReadBuffersCanStartIO(buffers[nblocks_done], false))
>>> {
>>> ...
>>> /*
>>> * Report and track this as a 'hit' for this backend, even though it
>>> * must have started out as a miss in PinBufferForBlock(). The other
>>> * backend will track this as a 'read'.
>>> */
>>> ...
>>> if (persistence == RELPERSISTENCE_TEMP)
>>> pgBufferUsage.local_blks_hit += 1;
>>> else
>>> pgBufferUsage.shared_blks_hit += 1;
>>> ...
>>>
>>>
>>
>> I think it has to be this. It only happens with io_method != sync, and
>> only with effective_io_concurrency > 1. At first I was wondering why I
>> can't reproduce this for seqscan/bitmapscan, but then I realized those
>> plans never visit the same block repeatedly - indexscans do that. It's
>> also not surprising it's timing-sensitive, as it likely depends on how
>> fast the worker happens to start/complete requests.
>>
>> What would be a good way to "prove" it really is this?
>
> I'd just comment out those stats increments and then check if the stats are
> stable afterwards.
>

I tried that, but it's not enough - the buffer hits gets lower, but
remains variable. It stabilizes only if I comment out the increment in
PinBufferForBlock() too. At which point it gets to 0, of course ...

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-08-13 22:32:13 Re: index prefetching
Previous Message Jeff Davis 2025-08-13 22:12:17 Re: Improve the performance of Unicode Normalization Forms.