Re: index prefetching

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>
Subject: Re: index prefetching
Date: 2023-12-18 21:00:30
Message-ID: CA+Tgmoaf0EHzcfWRydntzqhfc1Nv96C-60fEBbdf1pYZmT0SLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 9, 2023 at 1:08 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> But there's a layering problem that I don't know how to solve - I don't
> see how we could make indexam.c entirely oblivious to the prefetching,
> and move it entirely to the executor. Because how else would you know
> what to prefetch?

Yeah, that seems impossible.

Some thoughts:

* I think perhaps the subject line of this thread is misleading. It
doesn't seem like there is any index prefetching going on here at all,
and there couldn't be, unless you extended the index AM API with new
methods. What you're actually doing is prefetching heap pages that
will be needed by a scan of the index. I think this confusing naming
has propagated itself into some parts of the patch, e.g.
index_prefetch() reads *from the heap* which is not at all clear from
the comment saying "Prefetch the TID, unless it's sequential or
recently prefetched." You're not prefetching the TID: you're
prefetching the heap tuple to which the TID points. That's not an
academic distinction IMHO -- the TID would be stored in the index, so
if we were prefetching the TID, we'd have to be reading index pages,
not heap pages.

* Regarding layering, my first thought was that the changes to
index_getnext_tid() and index_getnext_slot() are sensible: read ahead
by some number of TIDs, keep the TIDs you've fetched in an array
someplace, use that to drive prefetching of blocks on disk, and return
the previously-read TIDs from the queue without letting the caller
know that the queue exists. I think that's the obvious design for a
feature of this type, to the point where I don't really see that
there's a viable alternative design. Driving something down into the
individual index AMs would make sense if you wanted to prefetch *from
the indexes*, but it's unnecessary otherwise, and best avoided.

* But that said, the skip_all_visible flag passed down to
index_prefetch() looks like a VERY strong sign that the layering here
is not what it should be. Right now, when some code calls
index_getnext_tid(), that function does not need to know or care
whether the caller is going to fetch the heap tuple or not. But with
this patch, the code does need to care. So knowledge of the executor
concept of an index-only scan trickles down into indexam.c, which now
has to be able to make decisions that are consistent with the ones
that the executor will make. That doesn't seem good at all.

* I think it might make sense to have two different prefetching
schemes. Ideally they could share some structure. If a caller is using
index_getnext_slot(), then it's easy for prefetching to be fully
transparent. The caller can just ask for TIDs and the prefetching
distance and TID queue can be fully under the control of something
that is hidden from the caller. But when using index_getnext_tid(),
the caller needs to have an opportunity to evaluate each TID and
decide whether we even want the heap tuple. If yes, then we feed that
TID to the prefetcher; if no, we don't. That way, we're not
replicating executor logic in lower-level code. However, that also
means that the IOS logic needs to be aware that this TID queue exists
and interact with whatever controls the prefetch distance. Perhaps
after calling index_getnext_tid() you call
index_prefetcher_put_tid(prefetcher, tid, bool fetch_heap_tuple) and
then you call index_prefetcher_get_tid() to drain the queue. Perhaps
also the prefetcher has a "fill" callback that gets invoked when the
TID queue isn't as full as the prefetcher wants it to be. Then
index_getnext_slot() can just install a trivial fill callback that
says index_prefetecher_put_tid(prefetcher, index_getnext_tid(...),
true), but IOS can use a more sophisticated callback that checks the
VM to determine what to pass for the third argument.

* I realize that I'm being a little inconsistent in what I just said,
because in the first bullet point I said that this wasn't really index
prefetching, and now I'm proposing function names that still start
with index_prefetch. It's not entirely clear to me what the best thing
to do about the terminology is here -- could it be a heap prefetcher,
or a TID prefetcher, or an index scan prefetcher? I don't really know,
but whatever we can do to make the naming more clear seems like a
really good idea. Maybe there should be a clearer separation between
the queue of TIDs that we're going to return from the index and the
queue of blocks that we want to prefetch to get the corresponding heap
tuples -- making that separation crisper might ease some of the naming
issues.

* Not that I want to be critical because I think this is a great start
on an important project, but it does look like there's an awful lot of
stuff here that still needs to be sorted out before it would be
reasonable to think of committing this, both in terms of design
decisions and just general polish. There's a lot of stuff marked with
XXX and I think that's great because most of those seem to be good
questions but that does leave the, err, small problem of figuring out
the answers. index_prefetch_is_sequential() makes me really nervous
because it seems to depend an awful lot on whether the OS is doing
prefetching, and how the OS is doing prefetching, and I think those
might not be consistent across all systems and kernel versions.
Similarly with index_prefetch(). There's a lot of "magical"
assumptions here. Even index_prefetch_add_cache() has this problem --
the function assumes that it's OK if we sometimes fail to detect a
duplicate prefetch request, which makes sense, but under what
circumstances is it necessary to detect duplicates and in what cases
is it optional? The function comments are silent about that, which
makes it hard to assess whether the algorithm is good enough.

* In terms of polish, one thing I noticed is that index_getnext_slot()
calls index_prefetch_tids() even when scan->xs_heap_continue is set,
which seems like it must be a waste, since we can't really need to
kick off more prefetch requests halfway through a HOT chain referenced
by a single index tuple, can we? Also, blks_prefetch_rounds doesn't
seem to be used anywhere, and neither that nor blks_prefetches are
documented. In fact there's no new documentation at all, which seems
probably not right. That's partly because there are no new GUCs, which
I feel like typically for a feature like this would be the place where
the feature behavior would be mentioned in the documentation. I don't
think it's a good idea to tie the behavior of this feature to
effective_io_concurrency partly because it's usually a bad idea to
make one setting control multiple different things, but perhaps even
more because effective_io_concurrency doesn't actually work in a
useful way AFAICT and people typically have to set it to some very
artificially large value compared to how much real I/O parallelism
they have. So probably there should be new GUCs with hopefully-better
semantics, but at least the documentation for any existing ones would
need updating, I would think.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tristan Partin 2023-12-18 21:18:47 Re: Add --check option to pgindent
Previous Message Jeff Davis 2023-12-18 20:39:05 Re: encoding affects ICU regex character classification