Re: index prefetching

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2025-07-16 21:41:06
Message-ID: srxpicevtse2tk2i6tqkldpx3qyf7utwqsmsupeyhqlpdmh2ng@whriijumyye3
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-07-16 17:27:23 -0400, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 4:46 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Maybe I'm missing something, but the current interface doesn't seem to work
> > for AMs that don't have a 1:1 mapping between the block number portion of the
> > tid and the actual block number?
>
> I'm not completely sure what you mean here.
>
> Even within nbtree, posting list tuples work by setting the
> INDEX_ALT_TID_MASK index tuple header bit. That makes nbtree interpret
> IndexTupleData.t_tid as metadata (in this case describing a posting
> list). Obviously, that isn't "a standard IndexTuple", but that won't
> break either patch/approach.
>
> The index AM is obligated to pass back heap TIDs, without any external
> code needing to understand these sorts of implementation details. The
> on-disk representation of TIDs remains an implementation detail known
> only to index AMs.

I don't mean the index tids, but how the read stream is fed block numbers. In
the "complex" patch that's done by index_scan_stream_read_next(). And the
block number it returns is simply

return ItemPointerGetBlockNumber(tid);

without the table AM having any way of influencing that. Which means that if
your table AM does not use the block number of the tid 1:1 as the real block
number, the fetched block will be completely bogus.

It's similar in the simple patch, bt_stream_read_next() etc also just use
ItemPointerGetBlockNumber().

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2025-07-16 21:47:03 Re: small fix for pg_overexplain docs
Previous Message Andres Freund 2025-07-16 21:34:31 Re: libpq: Process buffered SSL read bytes to support records >8kB on async API