Re: index prefetching

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: index prefetching
Date: 2024-01-16 08:13:43
Message-ID: 8969e2cf-6b23-4682-ab56-b3c0aca877ab@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 12/01/2024 6:42 pm, Tomas Vondra wrote:
> Hi,
>
> Here's an improved version of this patch, finishing a lot of the stuff
> that I alluded to earlier - moving the code from indexam.c, renaming a
> bunch of stuff, etc. I've also squashed it into a single patch, to make
> it easier to review.

I am thinking about testing you patch with Neon (cloud Postgres). As far
as Neon seaprates compute and storage, prefetch is much more critical
for Neon
architecture than for vanilla Postgres.

I have few complaints:

1. It disables prefetch for sequential access pattern (i.e. INDEX
MERGE), motivating it that in this case OS read-ahead will be more
efficient than prefetch. It may be true for normal storage devices, bit
not for Neon storage and may be also for Postgres on top of DFS (i.e.
Amazon RDS). I wonder if we can delegate decision whether to perform
prefetch in this case or not to some other level. I do not know
precisely where is should be handled. The best candidate IMHO is
storager manager. But it most likely requires extension of SMGR API. Not
sure if you want to do it... Straightforward solution is to move this
logic to some callback, which can be overwritten by user.

2. It disables prefetch for direct_io. It seems to be even more obvious
than 1), because prefetching using `posix_fadvise` definitely not
possible in case of using direct_io. But in theory if SMGR provides some
alternative prefetch implementation (as in case of Neon), this also may
be not true. Still unclear why we can want to use direct_io in Neon...
But still I prefer to mo.ve this decision outside executor.

3. It doesn't perform prefetch of leave pages for IOS, only referenced
heap pages which are not marked as all-visible. It seems to me that if
optimized has chosen IOS (and not bitmap heap scan for example), then
there should be large enough fraction for all-visible pages. Also index
prefetch is most efficient for OLAp queries and them are used to be
performance for historical data which is all-visible. But IOS can be
really handled separately in some other PR. Frankly speaking combining
prefetch of leave B-Tree pages and referenced heap pages seems to be
very challenged task.

4. I think that performing prefetch at executor level is really great
idea and so prefetch can be used by all indexes, including custom
indexes. But prefetch will be efficient only if index can provide fast
access to next TID (located at the same page). I am not sure that it is
true for all builtin indexes (GIN, GIST, BRIN,...) and especially for
custom AM. I wonder if we should extend AM API to make index make a
decision weather to perform prefetch of TIDs or not.

5. Minor notice: there are few places where index_getnext_slot is called
with last NULL parameter (disabled prefetch) with the following comment
"XXX Would be nice to also benefit from prefetching here." But all this
places corresponds to "point loopkup", i.e. unique constraint check,
find replication tuple by index... Prefetch seems to be unlikely useful
here, unlkess there is index bloating and and we have to skip a lot of
tuples before locating right one. But should we try to optimize case of
bloated indexes?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-01-16 08:42:11 Re: Synchronizing slots from primary to standby
Previous Message Anthonin Bonnefoy 2024-01-16 08:13:17 Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx