Re: index prefetching

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>
Subject: Re: index prefetching
Date: 2024-01-09 20:31:39
Message-ID: CA+TgmoacmxNQBZsgMPSn9b7oE6QQLOqr_EP=YZTzdfHrqa5dqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 4, 2024 at 9:55 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> Here's a somewhat reworked version of the patch. My initial goal was to
> see if it could adopt the StreamingRead API proposed in [1], but that
> turned out to be less straight-forward than I hoped, for two reasons:

I guess we need Thomas or Andres or maybe Melanie to comment on this.

> Perhaps a bigger change is that I decided to move this into a separate
> API on top of indexam.c. The original idea was to integrate this into
> index_getnext_tid/index_getnext_slot, so that all callers benefit from
> the prefetching automatically. Which would be nice, but it also meant
> it's need to happen in the indexam.c code, which seemed dirty.

This patch is hard to review right now because there's a bunch of
comment updating that doesn't seem to have been done for the new
design. For instance:

+ * XXX This does not support prefetching of heap pages. When such
prefetching is
+ * desirable, use index_getnext_tid().

But not any more.

+ * XXX The prefetching may interfere with the patch allowing us to evaluate
+ * conditions on the index tuple, in which case we may not need the heap
+ * tuple. Maybe if there's such filter, we should prefetch only pages that
+ * are not all-visible (and the same idea would also work for IOS), but
+ * it also makes the indexing a bit "aware" of the visibility stuff (which
+ * seems a somewhat wrong). Also, maybe we should consider the filter
selectivity

I'm not sure whether all the problems in this area are solved, but I
think you've solved enough of them that this at least needs rewording,
if not removing.

+ * XXX Comment/check seems obsolete.

This occurs in two places. I'm not sure if it's accurate or not.

+ * XXX Could this be an issue for the prefetching? What if we
prefetch something
+ * but the direction changes before we get to the read? If that
could happen,
+ * maybe we should discard the prefetched data and go back? But can we even
+ * do that, if we already fetched some TIDs from the index? I don't think
+ * indexorderdir can't change, but es_direction maybe can?

But your email claims that "The patch simply disables prefetching for
such queries, using the same logic that we do for parallelism." FWIW,
I think that's a fine way to handle that case.

+ * XXX Maybe we should enable prefetching, but prefetch only pages that
+ * are not all-visible (but checking that from the index code seems like
+ * a violation of layering etc).

Isn't this fixed now? Note this comment occurs twice.

+ * XXX We need to disable this in some cases (e.g. when using index-only
+ * scans, we don't want to prefetch pages). Or maybe we should prefetch
+ * only pages that are not all-visible, that'd be even better.

Here again.

And now for some comments on other parts of the patch, mostly other
XXX comments:

+ * XXX This does not support prefetching of heap pages. When such
prefetching is
+ * desirable, use index_getnext_tid().

There's probably no reason to write XXX here. The comment is fine.

+ * XXX Notice we haven't added the block to the block queue yet, and there
+ * is a preceding block (i.e. blockIndex-1 is valid).

Same here, possibly? If this XXX indicates a defect in the code, I
don't know what the defect is, so I guess it needs to be more clear.
If it is just explaining the code, then there's no reason for the
comment to say XXX.

+ * XXX Could it be harmful that we read the queue backwards? Maybe memory
+ * prefetching works better for the forward direction?

It does. But I don't know whether that matters here or not.

+ * XXX We do add the cache size to the request in order not to
+ * have issues with uint64 underflows.

I don't know what this means.

+ * XXX not sure this correctly handles xs_heap_continue - see
index_getnext_slot,
+ * maybe nodeIndexscan needs to do something more to handle this?
Although, that
+ * should be in the indexscan next_cb callback, probably.
+ *
+ * XXX If xs_heap_continue=true, we need to return the last TID.

You've got a bunch of comments about xs_heap_continue here -- and I
don't fully understand what the issues are here with respect to this
particular patch, but I think that the general purpose of
xs_heap_continue is to handle the case where we need to return more
than one tuple from the same HOT chain. With an MVCC snapshot that
doesn't happen, but with say SnapshotAny or SnapshotDirty, it could.
As far as possible, the prefetcher shouldn't be involved at all when
xs_heap_continue is set, I believe, because in that case we're just
returning a bunch of tuples from the same page, and the extra fetches
from that heap page shouldn't trigger or require any further
prefetching.

+ * XXX Should this also look at plan.plan_rows and maybe cap the target
+ * to that? Pointless to prefetch more than we expect to use. Or maybe
+ * just reset to that value during prefetching, after reading the next
+ * index page (or rather after rescan)?

It seems questionable to use plan_rows here because (1) I don't think
we have existing cases where we use the estimated row count in the
executor for anything, we just carry it through so EXPLAIN can print
it and (2) row count estimates can be really far off, especially if
we're on the inner side of a nested loop, we might like to figure that
out eventually instead of just DTWT forever. But on the other hand
this does feel like an important case where we have a clue that
prefetching might need to be done less aggressively or not at all, and
it doesn't seem right to ignore that signal either. I wonder if we
want this shaped in some other way, like a Boolean that says
are-we-under-a-potentially-row-limiting-construct e.g. limit or inner
side of a semi-join or anti-join.

+ * We reach here if the index only scan is not parallel, or if we're
+ * serially executing an index only scan that was planned to be
+ * parallel.

Well, this seems sad.

+ * XXX This might lead to IOS being slower than plain index scan, if the
+ * table has a lot of pages that need recheck.

How?

+ /*
+ * XXX Only allow index prefetching when parallelModeOK=true. This is a bit
+ * of a misuse of the flag, but we need to disable prefetching for cursors
+ * (which might change direction), and parallelModeOK does that. But maybe
+ * we might (or should) have a separate flag.
+ */

I think the correct flag to be using here is execute_once, which
captures whether the executor could potentially be invoked a second
time for the same portal. Changes in the fetch direction are possible
if and only if !execute_once.

> Note 1: The IndexPrefetch name is a bit misleading, because it's used
> even with prefetching disabled - all index reads from the index scan
> happen through it. Maybe it should be called IndexReader or something
> like that.

My biggest gripe here is the capitalization. This version adds, inter
alia, IndexPrefetchAlloc, PREFETCH_QUEUE_INDEX, and
index_heap_prefetch_target, which seems like one or two too many
conventions. But maybe the PREFETCH_* macros don't even belong in a
public header.

I do like the index_heap_prefetch_* naming. Possibly that's too
verbose to use for everything, but calling this index-heap-prefetch
rather than index-prefetch seems clearer.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-01-09 20:40:36 Re: Emit fewer vacuum records by reaping removable tuples during pruning
Previous Message Melanie Plageman 2024-01-09 20:13:30 Re: Emit fewer vacuum records by reaping removable tuples during pruning