Re: index prefetching

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>
Subject: Re: index prefetching
Date: 2023-12-21 15:20:45
Message-ID: afd17f3c-d496-5583-e93a-5a162d37235f@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/21/23 14:43, Andres Freund wrote:
> Hi,
>
> On 2023-12-21 13:30:42 +0100, Tomas Vondra wrote:
>> You're right a lot of this is a guesswork. I don't think we can do much
>> better, because it depends on stuff that's out of our control - each OS
>> may do things differently, or perhaps it's just configured differently.
>>
>> But I don't think this is really a serious issue - all the read-ahead
>> implementations need to work about the same, because they are meant to
>> work in a transparent way.
>>
>> So it's about deciding at which point we think this is a sequential
>> pattern. Yes, the OS may use a slightly different threshold, but the
>> exact value does not really matter - in the worst case we prefetch a
>> couple more/fewer blocks.
>>
>> The OS read-ahead can't really prefetch anything except sequential
>> cases, so the whole question is "When does the access pattern get
>> sequential enough?". I don't think there's a perfect answer, and I don't
>> think we need a perfect one - we just need to be reasonably close.
>
> For the streaming read interface (initially backed by fadvise, to then be
> replaced by AIO) we found that it's clearly necessary to avoid fadvises in
> cases of actual sequential IO - the overhead otherwise leads to easily
> reproducible regressions. So I don't think we have much choice.
>

Yeah, the regression are pretty easy to demonstrate. In fact, I didn't
have such detection in the first patch, but after the first round of
benchmarks it became obvious it's needed.

>
>> Also, while I don't want to lazily dismiss valid cases that might be
>> affected by this, I think that sequential access for index paths is not
>> that common (with the exception of clustered indexes).
>
> I think sequential access is common in other cases as well. There's lots of
> indexes where heap tids are almost perfectly correlated with index entries,
> consider insert only insert-only tables and serial PKs or inserted_at
> timestamp columns. Even leaving those aside, for indexes with many entries
> for the same key, we sort by tid these days, which will also result in
> "runs" of sequential access.
>

True. I should have thought about those cases.

>
>> Obviously, the latter case has much more severe impact, but it depends
>> on the exact workload / access pattern etc. The only "perfect" solution
>> would be to actually check the page cache, but well - that seems to be
>> fairly expensive.
>
>> What I was envisioning was something self-tuning, based on the I/O we
>> may do later. If the prefetcher decides to prefetch something, but finds
>> it's already in cache, we'd increase the distance, to remember more
>> blocks. Likewise, if a block is not prefetched but then requires I/O
>> later, decrease the distance. That'd make it adaptive, but I don't think
>> we actually have the info about I/O.
>
> How would the prefetcher know that hte data wasn't in cache?
>

I don't think there's a good way to do that, unfortunately, or at least
I'm not aware of it. That's what I meant by "we don't have the info" at
the end. Which is why I haven't tried implementing it.

The only "solution" I could come up with was some sort of "timing" for
the I/O requests and deducing what was cached. Not great, of course.

>
>> Alternatively, I was thinking about moving the prefetches into a
>> separate worker process (or multiple workers), so we'd just queue the
>> request and all the overhead would be done by the worker. The main
>> problem is the overhead of calling posix_fadvise() for blocks that are
>> already in memory, and this would just move it to a separate backend. I
>> wonder if that might even make the custom cache unnecessary / optional.
>
> The AIO patchset provides this.
>

OK, I guess it's time for me to take a look at the patch again.

>
>> AFAICS this seems similar to some of the AIO patch, I wonder what that
>> plans to do. I need to check.
>
> Yes, most of this exists there. The difference that with the AIO you don't
> need to prefetch, as you can just initiate the IO for real, and wait for it to
> complete.
>

Right, although the line where things stop being "prefetch" and becomes
"async" seems a bit unclear to me / perhaps more a point of view.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Frédéric Yhuel 2023-12-21 15:21:26 Re: Set log_lock_waits=on by default
Previous Message Alexander Lakhin 2023-12-21 15:00:00 Re: trying again to get incremental backup