| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: index prefetching |
| Date: | 2026-04-04 05:21:53 |
| Message-ID: | s25f6opuqwmz7oad467twvgxc36zmgzguhph43z4sbfkflsjnq@r6cuj7cl36lg |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-04-03 01:17:14 -0400, Peter Geoghegan wrote:
> On Wed, Apr 1, 2026 at 6:50 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Attached is v20, which does things this way. Now it's completely clear
> > that the scan's batch ring buffer is fully controlled by the table AM.
>
> Attached is v21, which advances things in the direction established in
> v20. It also fixes bitrot -- we no longer carry certain read stream
> related patches from Andres that were committed since I posted v20 a
> couple of days ago.
There's already bitrot again :(
commit b7b27eb41a5
Author: Amit Langote <amitlan(at)postgresql(dot)org>
Date: 2026-04-03 14:33:53 +0900
Optimize fast-path FK checks with batched index probes
Added new calls that need to be updated.
> From 0d2fb87ab64539b64fce948d7829d78a0ceee8f8 Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan <pg(at)bowt(dot)ie>
> Date: Wed, 1 Apr 2026 17:44:54 -0400
> Subject: [PATCH v21 01/15] Rename heapam_index_fetch_tuple argument for
> clarity.
>
> Rename heapam_index_fetch_tuple's call_again argument to heap_continue,
> for consistency with the pointed-to variable name (IndexScanDescData's
> xs_heap_continue field).
>
> Preparation for an upcoming commit that will move index scan related
> heapam functions into their own file.
>
> Author: Peter Geoghegan <pg(at)bowt(dot)ie>
> Reviewed-By: Andres Freund <andres(at)anarazel(dot)de>
> Discussion: https://postgr.es/m/bmbrkiyjxoal6o5xadzv5bveoynrt3x37wqch7w3jnwumkq2yo@b4zmtnrfs4mh
> From c883bcff62612c538eae9d8d950321f5d9876d3c Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan <pg(at)bowt(dot)ie>
> Date: Wed, 25 Mar 2026 00:22:17 -0400
> Subject: [PATCH v21 02/15] Move heapam_handler.c index scan code to new file.
>
> Move the heapam index fetch callbacks (index_fetch_begin,
> index_fetch_reset, index_fetch_end, and index_fetch_tuple) into a new
> dedicated file. Also move heap_hot_search_buffer over. This is a
> purely mechanical move with no functional impact.
>
> Upcoming work to add a slot-based table AM interface for index scans
> will substantially expand this code. Keeping it in heapam_handler.c
> would clutter a file whose primary role is to wire up the TableAmRoutine
> callbacks. Bitmap heap scans and sequential scans would benefit from
> similar separation in the future.
>
> Author: Peter Geoghegan <pg(at)bowt(dot)ie>
> Reviewed-By: Andres Freund <andres(at)anarazel(dot)de>
> Discussion: https://postgr.es/m/bmbrkiyjxoal6o5xadzv5bveoynrt3x37wqch7w3jnwumkq2yo@b4zmtnrfs4mh
Any reason to hold off pushing these?
> From b21d3af88aa5af6c038e860465d644a47425f131 Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan <pg(at)bowt(dot)ie>
> Date: Sun, 22 Mar 2026 02:36:57 -0400
> Subject: [PATCH v21 03/15] Add slot-based table AM index scan interface.
>
> Add table_index_getnext_slot, a new table AM callback that wraps both
> plain and index-only index scans that use amgettuple. Two new
> TableAmRoutine callbacks are introduced -- one for plain scans and one
> for index-only scans -- which an upcoming commit that adds the
> amgetbatch interface will expand to four. The appropriate callback is
> resolved once in index_beginscan, and called through a function pointer
> (xs_getnext_slot) on the IndexScanDesc when the table_index_getnext_slot
> shim function is called from executor nodes.
>
> This moves VM checks for index-only scans out of the executor and into
> heapam, enabling batching of visibility map lookups (though for now we
> continue to just perform retail lookups). Using the new higher level
> slot-based interface greatly simplifies nodeIndexonlyscan.c, which no
> longer has to deal with the visibility map directly. More importantly,
> this is a significant architectural improvement: table AMs can now
> implement index-only scans that are not tied to heapam's visibility map.
Nice.
> A small minority of callers (2 callers in total) fundamentally need to
> pass a TID to the table AM (both perform constraint enforcement). These
> callers don't actually perform index scans (even if their TIDs are taken
> from an index), and have no need for most of the index scan machinery.
> Switch these callers over to the new fetch_tid interface (which replaces
> the previous TID-based index_fetch_tuple interface). All index scan
> callers now use the new slot-based interface (table_index_getnext_slot).
Nice.
> Index-only scan callers pass table_index_getnext_slot a TupleTableSlot
> (which the table AM needs internally for heap fetches),
FWIW, that parenthetical does not convince me. Storing tuples in a slot is
more expensive than just doing so internally in a non-generic datastructure.
> @@ -181,6 +184,14 @@ typedef struct IndexScanDescData
> * further results */
> IndexFetchTableData *xs_heapfetch;
>
> + /*
> + * Resolved table_index_getnext_slot callback, which is set by
> + * table_index_fetch_begin at the start of amgettuple scans
> + */
> + bool (*xs_getnext_slot) (struct IndexScanDescData *scan,
> + ScanDirection direction,
> + struct TupleTableSlot *slot);
> +
> bool xs_recheck; /* T means scan keys must be rechecked */
>
> /*
At first I didn't like very much that now the tableam is responsible for
determining this, but thinking about it more, it makes sense - it very well
could be that a tableam could benefit from choosing a xs_getnext_slot
implementation depending on things only it itself knows.
> diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> index 4647785fd..e05c09fdb 100644
> --- a/src/include/access/tableam.h
> +++ b/src/include/access/tableam.h
> @@ -17,6 +17,7 @@
> #ifndef TABLEAM_H
> #define TABLEAM_H
>
> +#include "access/genam.h"
> #include "access/relscan.h"
> #include "access/sdir.h"
> #include "access/xact.h"
I'd strongly prefer forward declaring
typedef struct IndexScanDescData *IndexScanDesc;
over including genam.h into tableam.h
genam.h has a decent footprint with its own includes, I don't think we should
unnecessarily expose that to all tableam.h includes.
> @@ -442,60 +443,66 @@ typedef struct TableAmRoutine
> */
>
> /*
> - * Prepare to fetch tuples from the relation, as needed when fetching
> - * tuples for an index scan. The callback has to return an
> - * IndexFetchTableData, which the AM will typically embed in a larger
> - * structure with additional information.
> + * Prepare to fetch tuples, as needed when fetching tuples for an index
> + * scan. The callback has to return an IndexFetchTableData, which the AM
> + * will typically embed in a larger structure with additional information.
> + * A pointer to this structure will be stored in passed index scan
> + * descriptor's xs_heapfetch field by the caller (core executor code).
> *
> * flags is a bitmask of ScanOptions affecting underlying table scan
> * behavior. See scan_begin() for more information on passing these.
> *
> - * Tuples for an index scan can then be fetched via index_fetch_tuple.
> + * Callback is responsible for setting IndexScanDesc.xs_getnext_slot to
> + * the appropriate slot-based callback. Tuples can then be fetched via
> + * table_index_getnext_slot(). No separate slot-based callback exists in
> + * this struct!
> + *
> + * In principle a single general-purpose callback (stored here) would
> + * suffice, but using specialized variants allows the table AM to provide
> + * minimal code based on conditions that are fixed for the whole scan as
> + * an optimization (e.g., variants for plain index scans and index-only
> + * scans, each with fewer branches).
> + *
> + * Note that AMs that do not necessarily update indexes when indexed
> + * columns do not change, need to return the current/correct version of
> + * the tuple that is visible to the snapshot, even if the tid points to an
> + * older version of the tuple.
> */
> - struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
> + struct IndexFetchTableData *(*index_fetch_begin) (IndexScanDesc scan,
> + uint32 flags);
I kinda wonder if this should look roughly like this instead:
/* size of the tableam's IndexFetchTableData */
size_t index_fetch_state_size;
void *(*index_fetch_begin) (Relation rel, IndexScanDesc scan, IndexFetchTableData *table_fetch, uint32 flags);
Moving the allocation of the IndexFetchTableData to the caller. The advantage
would be that this would allow us to later combine the allocation of the
IndexScanDesc with the IndexFetchTableData, without again needing to change
the tableam interface.
>
> /*
> * Prepare to fetch tuples from the relation, as needed when fetching tuples
> - * for an index scan.
> + * for an index scan. Caller puts this in the passed index scan descriptor's
> + * xs_heapfetch field.
s/, as needed when fetching tuples for/as part of/
The old formulation sounds a bit odd now that it's not not something standing
on its own anymore.
> /*
> - * Release resources and deallocate index fetch.
> + * Release resources and deallocate index fetch held in the scan's underlying
> + * IndexFetchTableData.
> */
The comment was weird before (mea culpa probably), but "index fetch held in
the scan's underlying IndexFetchTableData" seems even odder. A fetch held a
fetch :)
I'd just say "Release resources and deallocate the IndexFetchTableData in the
scan.".
If you were to go for the suggestion earlier about making it the caller's
responsibility to allocate the IndexFetchTableData, we'd obviously not make it
the tableam's responsibility anymore.
> +/*
> + * Convenience wrapper around table_fetch_tid() for callers that don't want to
> + * provide their own slot.
s/don't want to provide their own slot/that just need to check if a tuple is visible/
>
> +static bool heapam_index_plain_amgettuple_next(IndexScanDesc scan,
> + ScanDirection direction,
> + TupleTableSlot *slot);
> +static bool heapam_index_only_amgettuple_next(IndexScanDesc scan,
> + ScanDirection direction,
> + TupleTableSlot *slot);
> +static pg_attribute_always_inline bool heapam_index_fetch_tuple_impl(Relation rel,
> + IndexFetchHeapData *hscan,
> + ItemPointer tid,
> + Snapshot snapshot,
> + TupleTableSlot *slot,
> + bool *heap_continue,
> + bool *all_dead);
> +static pg_attribute_always_inline bool heapam_index_getnext_slot(IndexScanDesc scan,
> + ScanDirection direction,
> + TupleTableSlot *slot,
> + bool index_only);
> +static pg_attribute_always_inline bool heapam_index_fetch_heap(IndexScanDesc scan,
> + IndexFetchHeapData *hscan,
> + TupleTableSlot *slot,
> + bool *heap_continue);
> +
This is quite a spectacularly ugly indentation. Not your fault. Still can't
stand it :)
I'd probably just reorder the function definitions so you don't need the
forward declaration for the pg_attribute_always_inline functions.
> +/*
> + * Simple, single-shot TID lookup for constraint enforcement code (unique
> + * checks and similar). This is essentially just a heap_hot_search_buffer
> + * wrapper.
> + *
> + * This isn't actually related to index scans, but keeping it near
> + * heap_hot_search_buffer may help the compiler generate better code.
> + */
> bool
> -heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
> - ItemPointer tid,
> - Snapshot snapshot,
> - TupleTableSlot *slot,
> - bool *heap_continue, bool *all_dead)
> +heapam_fetch_tid(Relation rel, ItemPointer tid, Snapshot snapshot,
> + TupleTableSlot *slot, bool *all_dead)
> +{
> + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
> + Buffer buf;
> + bool found;
> +
> + Assert(TTS_IS_BUFFERTUPLE(slot));
> +
> + buf = ReadBuffer(rel, ItemPointerGetBlockNumber(tid));
> +
> + LockBuffer(buf, BUFFER_LOCK_SHARE);
> + found = heap_hot_search_buffer(tid, rel, buf, snapshot,
> + &bslot->base.tupdata, all_dead, true);
> + bslot->base.tupdata.t_self = *tid;
> + LockBuffer(buf, BUFFER_LOCK_UNLOCK);
> +
> + if (found)
> + {
> + slot->tts_tableOid = RelationGetRelid(rel);
> + ExecStorePinnedBufferHeapTuple(&bslot->base.tupdata, slot,
> + buf);
> + }
> + else
> + ReleaseBuffer(buf);
> +
> + return found;
> +}
> +
> +/* table_index_getnext_slot callback: amgettuple, plain index scan */
> +static pg_attribute_hot bool
> +heapam_index_plain_amgettuple_next(IndexScanDesc scan,
> + ScanDirection direction,
> + TupleTableSlot *slot)
> +{
> + Assert(!scan->xs_want_itup);
> + Assert(scan->indexRelation->rd_indam->amgettuple != NULL);
> +
> + return heapam_index_getnext_slot(scan, direction, slot, false);
> +}
> +
> +/* table_index_getnext_slot callback: amgettuple, index-only scan */
> +static pg_attribute_hot bool
> +heapam_index_only_amgettuple_next(IndexScanDesc scan,
> + ScanDirection direction,
> + TupleTableSlot *slot)
> +{
> + Assert(scan->xs_want_itup);
> + Assert(scan->indexRelation->rd_indam->amgettuple != NULL);
> +
> + return heapam_index_getnext_slot(scan, direction, slot, true);
> +}
> +
> +static pg_attribute_always_inline bool
> +heapam_index_fetch_tuple_impl(Relation rel,
> + IndexFetchHeapData *hscan,
> + ItemPointer tid,
> + Snapshot snapshot,
> + TupleTableSlot *slot,
> + bool *heap_continue, bool *all_dead)
I'd probably put all of these just after heapam_index_fetch_end(), given they
are basically part of the implementation of heapam_index_*.
An
> diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
> index 44496ae09..0bc8bb535 100644
> --- a/src/backend/access/index/indexam.c
> +++ b/src/backend/access/index/indexam.c
> @@ -24,9 +24,7 @@
> * index_parallelscan_initialize - initialize parallel scan
> * index_parallelrescan - (re)start a parallel scan of an index
> * index_beginscan_parallel - join parallel index scan
> - * index_getnext_tid - get the next TID from a scan
> - * index_fetch_heap - get the scan's next heap tuple
> - * index_getnext_slot - get the next tuple from a scan
> + * index_getnext_tid - amgettuple table AM helper routine
> * index_getbitmap - get all tuples from a scan
> * index_bulk_delete - bulk deletion of index tuples
> * index_vacuum_cleanup - post-deletion cleanup of an index
> @@ -105,9 +103,16 @@ do { \
> CppAsString(pname), RelationGetRelationName(scan->indexRelation)); \
> } while(0)
>
> -static IndexScanDesc index_beginscan_internal(Relation indexRelation,
> - int nkeys, int norderbys, Snapshot snapshot,
> - ParallelIndexScanDesc pscan, bool temp_snap);
> +static pg_attribute_always_inline IndexScanDesc index_beginscan_internal(Relation indexRelation,
> + Relation heapRelation,
> + int nkeys,
> + int norderbys,
> + Snapshot snapshot,
> + ParallelIndexScanDesc pscan,
> + IndexScanInstrumentation *instrument,
> + bool index_only_scan,
> + bool temp_snap,
> + uint32 flags);
> static inline void validate_relation_as_index(Relation r);
Pretty sure it'd be enough to put the pg_attribute_always_inline in the
function definition. Or you just move the definition of
index_beginscan_internal() to before index_beginscan(), and you don't need the
prototype anymore. Which imo is nicer, because you don't need to update as
many places anymore when changing the signature.
> - /*
> - * Only MVCC snapshots are supported here, so there should be no
> - * need to keep following the HOT chain once a visible entry has
> - * been found. If we did want to allow that, we'd need to keep
> - * more state to remember not to call index_getnext_tid next time.
> - */
> - if (scandesc->xs_heap_continue)
> - elog(ERROR, "non-MVCC snapshots are not supported in index-only scans");
So that's now only an assertion, right? Probably ok...
> @@ -792,15 +702,14 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
>
> node->ioss_ScanDesc =
> index_beginscan_parallel(node->ss.ss_currentRelation,
> - node->ioss_RelationDesc,
> + node->ioss_RelationDesc, true,
> node->ioss_Instrument,
> node->ioss_NumScanKeys,
> node->ioss_NumOrderByKeys,
> piscan,
> ScanRelIsReadOnly(&node->ss) ?
> SO_HINT_REL_READ_ONLY : SO_NONE);
> - node->ioss_ScanDesc->xs_want_itup = true;
> - node->ioss_VMBuffer = InvalidBuffer;
> + Assert(node->ioss_ScanDesc->xs_want_itup);
>
> /*
> * If no run-time keys to calculate or they are ready, go ahead and pass
> @@ -860,14 +769,14 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
>
> node->ioss_ScanDesc =
> index_beginscan_parallel(node->ss.ss_currentRelation,
> - node->ioss_RelationDesc,
> + node->ioss_RelationDesc, true,
> node->ioss_Instrument,
> node->ioss_NumScanKeys,
> node->ioss_NumOrderByKeys,
> piscan,
> ScanRelIsReadOnly(&node->ss) ?
> SO_HINT_REL_READ_ONLY : SO_NONE);
> - node->ioss_ScanDesc->xs_want_itup = true;
> + Assert(node->ioss_ScanDesc->xs_want_itup);
>
> /*
> * If no run-time keys to calculate or they are ready, go ahead and pass
> diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
> index 1620d1460..55fa2d421 100644
> --- a/src/backend/executor/nodeIndexscan.c
> +++ b/src/backend/executor/nodeIndexscan.c
> @@ -109,7 +109,7 @@ IndexNext(IndexScanState *node)
> * serially executing an index scan that was planned to be parallel.
> */
> scandesc = index_beginscan(node->ss.ss_currentRelation,
> - node->iss_RelationDesc,
> + node->iss_RelationDesc, false,
> estate->es_snapshot,
> node->iss_Instrument,
> node->iss_NumScanKeys,
> @@ -132,7 +132,7 @@ IndexNext(IndexScanState *node)
> /*
> * ok, now that we have what we need, fetch the next tuple.
> */
> - while (index_getnext_slot(scandesc, direction, slot))
> + while (table_index_getnext_slot(scandesc, direction, slot))
> {
> CHECK_FOR_INTERRUPTS();
>
> @@ -207,7 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
> * serially executing an index scan that was planned to be parallel.
> */
> scandesc = index_beginscan(node->ss.ss_currentRelation,
> - node->iss_RelationDesc,
> + node->iss_RelationDesc, false,
> estate->es_snapshot,
> node->iss_Instrument,
> node->iss_NumScanKeys,
> @@ -266,7 +266,7 @@ IndexNextWithReorder(IndexScanState *node)
> * Fetch next tuple from the index.
> */
> next_indextuple:
> - if (!index_getnext_slot(scandesc, ForwardScanDirection, slot))
> + if (!table_index_getnext_slot(scandesc, ForwardScanDirection, slot))
> {
> /*
> * No more tuples from the index. But we still need to drain any
> @@ -818,6 +818,7 @@ ExecEndIndexScan(IndexScanState *node)
> * which will have a new IndexOnlyScanState and zeroed stats.
> */
> winstrument->nsearches += node->iss_Instrument->nsearches;
> + Assert(node->iss_Instrument->ntabletuplefetches == 0);
> }
>
> /*
> @@ -1730,7 +1731,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
>
> node->iss_ScanDesc =
> index_beginscan_parallel(node->ss.ss_currentRelation,
> - node->iss_RelationDesc,
> + node->iss_RelationDesc, false,
> node->iss_Instrument,
> node->iss_NumScanKeys,
> node->iss_NumOrderByKeys,
> @@ -1796,7 +1797,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
>
> node->iss_ScanDesc =
> index_beginscan_parallel(node->ss.ss_currentRelation,
> - node->iss_RelationDesc,
> + node->iss_RelationDesc, false,
> node->iss_Instrument,
> node->iss_NumScanKeys,
> node->iss_NumOrderByKeys,
Given that all the index_beginscan* callers used onlly one argument per line,
I'd also do that for the ios=false.
> From 491b65ca7ddf3829ae217aa5eac7137e76d9a6eb Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan <pg(at)bowt(dot)ie>
> Date: Tue, 10 Mar 2026 14:40:35 -0400
> Subject: [PATCH v21 04/15] heapam: Track heap block in IndexFetchHeapData.
>
> Add an explicit BlockNumber field (xs_blk) to IndexFetchHeapData that
> tracks which heap block is currently pinned in xs_cbuf.
>
> heapam_index_fetch_tuple now uses xs_blk to determine when buffer
> switching is needed, replacing the previous approach that compared
> buffer identities via ReleaseAndReadBuffer on every non-HOT-chain call.
>
> This is preparatory work for an upcoming commit that will add index
> prefetching using a read stream. Delegating the release of a currently
> pinned buffer to ReleaseAndReadBuffer won't work anymore -- at least not
> when the next buffer that the scan needs to pin is one returned by
> read_stream_next_buffer (not a buffer returned by ReadBuffer).
>
> Author: Peter Geoghegan <pg(at)bowt(dot)ie>
> Reviewed-By: Andres Freund <andres(at)anarazel(dot)de>
> Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com
LGTM.
> From 00244c1729f697ae72d77acfc2fcbc21d0c7236d Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan <pg(at)bowt(dot)ie>
> Date: Thu, 26 Mar 2026 18:45:27 -0400
> Subject: [PATCH v21 05/15] heapam: Keep buffer pins across index rescans.
>
> Avoid dropping the heap page pin (xs_cbuf) and visibility map pin
> (xs_vmbuffer) during heapam_index_fetch_reset. Retaining these pins
> saves cycles during tight nested loop joins and merge joins that
> frequently restore a saved mark, since the next tuple fetched after a
> rescan often falls on the same heap page. It can also avoid repeated
> pinning and unpinning of the same buffer when rescans happen to revisit
> the same page.
>
> Note that not dropping xs_vmbuffer on a rescan isn't a new behavior
> (it's always worked this way). Recent commit XXX, which added a new
> slot-based interface, changed that behavior when it moved VM pin
> management out of the core executor. This commit restores that behavior
> (and has heapam treat heap page pins in the same way, which _is_ a new
> behavior).
>
> Preparation for an upcoming patch that will add the amgetbatch
> interface to enable optimizations such as I/O prefetching.
>
> Author: Peter Geoghegan <pg(at)bowt(dot)ie>
> Reviewed-By: Andres Freund <andres(at)anarazel(dot)de>
> Discussion: https://postgr.es/m/CAH2-Wz=g=JTSyDB4UtB5su2ZcvsS7VbP+ZMvvaG6ABoCb+s8Lw@mail.gmail.com
LGTM.
> diff --git a/src/backend/access/heap/heapam_indexscan.c b/src/backend/access/heap/heapam_indexscan.c
> index b7305f822..fd6f4fa9e 100644
> --- a/src/backend/access/heap/heapam_indexscan.c
> +++ b/src/backend/access/heap/heapam_indexscan.c
> @@ -73,18 +73,14 @@ heapam_index_fetch_reset(IndexScanDesc scan)
> {
> IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan->xs_heapfetch;
>
> - if (BufferIsValid(hscan->xs_cbuf))
> - {
> - ReleaseBuffer(hscan->xs_cbuf);
> - hscan->xs_cbuf = InvalidBuffer;
> - hscan->xs_blk = InvalidBlockNumber;
> - }
> + /* Resets are a no-op */
> + (void) hscan;
>
> - if (BufferIsValid(hscan->xs_vmbuffer))
> - {
> - ReleaseBuffer(hscan->xs_vmbuffer);
> - hscan->xs_vmbuffer = InvalidBuffer;
> - }
> + /*
> + * Deliberately avoid dropping pins now held in xs_cbuf and xs_vmbuffer.
> + * This saves cycles during certain tight nested loop joins (it can avoid
> + * repeated pinning and unpinning of the same buffer across rescans).
> + */
> }
>
> void
> @@ -92,7 +88,13 @@ heapam_index_fetch_end(IndexScanDesc scan)
> {
> IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan->xs_heapfetch;
>
> - heapam_index_fetch_reset(scan);
> + /* drop pin if there's a pinned heap page */
> + if (BufferIsValid(hscan->xs_cbuf))
> + ReleaseBuffer(hscan->xs_cbuf);
> +
> + /* drop pin if there's a pinned visibility map page */
> + if (BufferIsValid(hscan->xs_vmbuffer))
> + ReleaseBuffer(hscan->xs_vmbuffer);
>
> pfree(hscan);
> }
Personally I'd keep the clearing of the Buffer fields after the release, but I
do admit that's rank paranoia.
> From aaffcbba0ec9a52c050ab6ae332e31e069541531 Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan <pg(at)bowt(dot)ie>
> Date: Wed, 25 Mar 2026 16:48:43 -0400
> Subject: [PATCH v21 06/15] Add interfaces that enable index prefetching.
>
> Add a new amgetbatch index AM interface that allows index access methods
> to implement plain index scans and index-only scans that return index
> entries in batches comprising all matching items from an index page,
> rather than one match at a time. Also switch nbtree over from
> amgettuple to the new amgetbatch interface.
>
> This expands the slot-based table AM interface added by commit FIXME
> from two callbacks (for amgettuple plain and index-only scans) to four,
> adding amgetbatch variants for both scan types. The amgetbatch
> interface is tightly coupled with this approach to index scans: the
> table AM can apply knowledge of which TIDs will be returned to the scan
> in the near future to perform I/O prefetching. Prefetching will be
> added by an upcoming commit.
I think this paragraph is a bit out of date.
> With amgetbatch, a scan-level policy determines whether each batch's
> index page buffer pin is dropped eagerly by the index AM (for plain
> scans with an MVCC snapshot, where the snapshot itself prevents TID
> recycling problems) or retained as an interlock against concurrent TID
> recycling by VACUUM. The interlock is retained for non-MVCC scans and
s/non-MVCC/plain non-MVCC/
> for index-only scans, and is dropped by the table AM via the new
> amunguardbatch callback when it is safe to do so. (Actually, index AMs
> are usually able to drop the pin at the same time that they release the
> lock. In practice, the amunguardbatch callback is only really needed
> during index-only scans, where dropping the pin interlock might need to
> be delayed ever so slightly, as explained below.)
>
> This extends the dropPin mechanism added to nbtree by commit 2ed5b87f,
> and generalizes it to work with all index AMs that support the new
> amgetbatch interface (LP_DEAD marking of index entries must be performed
It's a bit funny to say that it expands it to all index AMs that support the
new amgetbatch, given that that's just nbtree as of this commit :)
> @@ -129,9 +129,39 @@ typedef struct IndexFetchHeapData
> BlockNumber xs_blk;
>
> /* For visibility map checks (index-only scans and on-access pruning) */
> - Buffer xs_vmbuffer;
> + Buffer xs_vmbuffer; /* visibility map buffer */
> + int xs_vm_items; /* # items to resolve visibility info for */
> +
> } IndexFetchHeapData;
Probably unintentional trailing empty line in the struct.
> +/*
> + * Access the heap-private fixed-size data from the beginning of an allocated
> + * IndexScanBatch, using caller's IndexScanBatch pointer
> + */
> +static inline HeapBatchData *
> +heap_batch_data(IndexScanDesc scan, IndexScanBatch batch)
> +{
> + /* heapam's fixed-size space is at the start of the palloc'd area */
> + return (HeapBatchData *) index_scan_batch_base(scan, batch);
> +}
I don't think the table AM implementation should need to know about the
allocation base of a batch and similar implementation details.
And index_scan_batch_base()'s comment are now a bit too limited:
/*
* Return the true allocation base of a batch (used to pfree batches)
*/
static inline void *
index_scan_batch_base(IndexScanDescData *scan, IndexScanBatch batch)
{
return (char *) batch - scan->batch_table_offset;
}
heap_batch_data() and HeapBatchData also seems like a bit too generic names,
there are batches in other parts of heapam.
Hm. Does all of this actually need to live in heapam.h? Seems like it could
just be in heapam_indexscan.c.
> + if (batch)
> + {
> + /* We got the batch from the index AM */
> + Assert(batch->dir == direction);
> + Assert(batch->firstItem <= batch->lastItem);
> +
> + /* Append batch to the end of ring buffer/write it to buffer index */
> + index_scan_batch_append(scan, batch);
> +
> + /*
> + * Theoretically we should set knownEndForward/knownEndBackward to
> + * false (whichever is used when moving in the opposite direction)
> + * when this is the scan's first returned batch. We don't bother
> + * because the index AM should always record that fact in its own
> + * opaque area. (These fields only exist because we don't want index
> + * AMs setting _any_ field from any priorbatch that we pass to them.
> + * Besides, it would be cumbersome for index AMs to keep track of
> + * which batch is the current amgetbatch call's original priorbatch.)
> + */
Why aren't we setting them to false?
> + }
> + else
> + {
> + /* amgetbatch returned NULL */
> + if (priorBatch)
> + {
> + /*
> + * There are no further matches to be found in the current scan
> + * direction, following priorBatch. Remember that priorBatch is
> + * the last batch with matching items.
> + */
> + if (ScanDirectionIsForward(direction))
> + priorBatch->knownEndForward = true;
> + else
> + priorBatch->knownEndBackward = true;
> + }
> + }
> +
> + /* xs_hitup isn't currently supported by amgetbatch scans */
> + Assert(!scan->xs_hitup);
> +
> + return batch;
> +}
> +
> +/*
> + * utilities called by index AMs
> + */
I'd make these section comments a bit more prominent (like e.g. in tableam.h,
otherwise they're quite hard to distinguish from function comments.
> +/*
> + * Data about one batch of items returned by (and passed to) amgetbatch during
> + * index scans.
> + *
> + * Each batch allocation has the following memory layout:
> + *
> + * [table AM opaque area] <- fixed-size, -(batch_table_offset) from base
> + * [index AM opaque area] <- at -(batch_index_opaque_size) from base
> + * [IndexScanBatchData] <- base pointer, returned by amgetbatch
This conflicts a bit with how e.g. index_scan_batch_base() uses base pointer.
> +StaticAssertDecl(INDEX_SCAN_MAX_BATCHES <= PG_UINT8_MAX + 1,
> + "INDEX_SCAN_MAX_BATCHES must fit in uint8 ring buffer indexes");
I'd just use a < without the + 1 on the other side :)
> @@ -2555,8 +2555,11 @@ static const TableAmRoutine heapam_methods = {
> .parallelscan_reinitialize = table_block_parallelscan_reinitialize,
>
> .index_fetch_begin = heapam_index_fetch_begin,
> + .index_fetch_batch_init = heapam_index_fetch_batch_init,
> .index_fetch_reset = heapam_index_fetch_reset,
> .index_fetch_end = heapam_index_fetch_end,
> + .index_fetch_markpos = heapam_index_fetch_markpos,
> + .index_fetch_restrpos = heapam_index_fetch_restrpos,
Given that the ->index_fetch_begin() now is responsible for configuring
IndexScanDesc->xs_getnext_slot, it's not clear to me that it really makes
sense to have stuff like ->index_fetch_batch_init() be set in TableAmRoutine.
Wonder if we should just have a TableIndexFetchRoutine struct embedded in the
IndexScanDesc that ->index_fetch_begin needs to fully initialize.
> @@ -38,11 +45,28 @@ static pg_attribute_always_inline bool heapam_index_fetch_tuple_impl(Relation re
> static pg_attribute_always_inline bool heapam_index_getnext_slot(IndexScanDesc scan,
> ScanDirection direction,
> TupleTableSlot *slot,
> - bool index_only);
> + bool index_only,
> + bool amgetbatch);
> static pg_attribute_always_inline bool heapam_index_fetch_heap(IndexScanDesc scan,
> IndexFetchHeapData *hscan,
> TupleTableSlot *slot,
> - bool *heap_continue);
> + bool *heap_continue,
> + bool amgetbatch);
> +static pg_attribute_always_inline ItemPointer heapam_index_getnext_scanbatch_pos(IndexScanDesc scan,
> + IndexFetchHeapData *hscan,
> + ScanDirection direction,
> + bool *all_visible);
> +static inline ItemPointer heapam_index_return_scanpos_tid(IndexScanDesc scan,
> + IndexFetchHeapData *hscan,
> + ScanDirection direction,
> + IndexScanBatch scanBatch,
> + BatchRingItemPos *scanPos,
> + bool *all_visible);
> +static void heapam_index_batch_pos_visibility(IndexScanDesc scan,
> + ScanDirection direction,
> + IndexScanBatch batch,
> + HeapBatchData *hbatch,
> + BatchRingItemPos *pos);
See earlier comments about ugliness of the super deep indentation due to
pg_attribute_always_inline and that getting rid of several of the prototypes
might be the best answer.
> +/*
> + * Restore batch ring buffer's markPos into its scanPos
> + */
> +void
> +heapam_index_fetch_restrpos(IndexScanDesc scan)
> +{
> + IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan->xs_heapfetch;
> +
> + Assert(scan->usebatchring);
> +
> + (void) hscan;
> +
> + tableam_util_batchscan_restore_pos(scan);
> +}
What's the point of fetching hscan and then using a (void) hscon to avoid a
warning?
> +/*
> + * Get next TID from batch ring buffer, moving in the given scan direction.
> + * Also sets *all_visible for item when caller passes a non-NULL arg.
> + */
> +static pg_attribute_always_inline ItemPointer
> +heapam_index_getnext_scanbatch_pos(IndexScanDesc scan, IndexFetchHeapData *hscan,
> + ScanDirection direction, bool *all_visible)
> +{
> + BatchRingBuffer *batchringbuf = &scan->batchringbuf;
> + BatchRingItemPos *scanPos = &batchringbuf->scanPos;
> + IndexScanBatch scanBatch = NULL;
> + bool hadExistingScanBatch;
> +
> + Assert(!scanPos->valid || batchringbuf->headBatch == scanPos->batch);
> + Assert(scanPos->valid || index_scan_batch_count(scan) == 0);
> + Assert(all_visible == NULL || scan->xs_want_itup);
> +
> + /*
> + * Check if there's an existing loaded scanBatch for us to return the next
> + * matching item's TID/index tuple from
> + */
> + hadExistingScanBatch = scanPos->valid;
> + if (scanPos->valid)
> + {
> + /*
> + * scanPos is valid, so scanBatch must already be loaded in batch ring
> + * buffer. We rely on that here.
> + */
> + pg_assume(batchringbuf->headBatch == scanPos->batch);
> +
> + scanBatch = index_scan_batch(scan, scanPos->batch);
> +
> + if (index_scan_pos_advance(direction, scanBatch, scanPos))
> + return heapam_index_return_scanpos_tid(scan, hscan, direction,
> + scanBatch, scanPos,
> + all_visible);
> + }
> +
> + /*
> + * Either ran out of items from our existing scanBatch, or it hasn't been
> + * loaded yet (because this is the first call here for the entire scan).
> + * Try to advance scanBatch to the next batch (or get the first batch).
> + */
> + scanBatch = tableam_util_fetch_next_batch(scan, direction,
> + scanBatch, scanPos);
> +
> + if (!scanBatch)
> + {
> + /*
> + * We're done; no more batches in the current scan direction.
> + *
> + * Note: scanPos is generally still valid at this point. The scan
> + * might still back up in the other direction.
> + */
> + return NULL;
> + }
> +
> + /*
> + * Advanced scanBatch. Now position scanPos to the start of new
> + * scanBatch.
> + */
> + index_scan_pos_nextbatch(direction, scanBatch, scanPos);
> + Assert(index_scan_batch(scan, scanPos->batch) == scanBatch);
> +
> + /*
> + * Remove the head batch from the batch ring buffer (except when this new
> + * scanBatch is our only one)
> + */
> + if (hadExistingScanBatch)
> + {
> + IndexScanBatch headBatch = index_scan_batch(scan,
> + batchringbuf->headBatch);
> +
> + Assert(headBatch != scanBatch);
> + Assert(batchringbuf->headBatch != scanPos->batch);
> +
> + /* free obsolescent head batch (unless it is scan's markBatch) */
> + tableam_util_free_batch(scan, headBatch);
> +
> + /* Remove the batch from the ring buffer (even if it's markBatch) */
> + batchringbuf->headBatch++;
> + }
> +
> + /* In practice scanBatch will always be the ring buffer's headBatch */
> + Assert(batchringbuf->headBatch == scanPos->batch);
> +
> + return heapam_index_return_scanpos_tid(scan, hscan, direction,
> + scanBatch, scanPos, all_visible);
> +}
Still looks like most of this ought to be in a tableam_util_...
> +/*
> + * Save the current scanPos/scanBatch item's TID in scan's xs_heaptid, and
> + * return a pointer to that TID. When all_visible isn't NULL (during an
> + * index-only scan), also sets item's visibility status in *all_visible.
> + *
> + * heapam_index_getnext_scanbatch_pos helper function.
> + */
> +static inline ItemPointer
> +heapam_index_return_scanpos_tid(IndexScanDesc scan, IndexFetchHeapData *hscan,
> + ScanDirection direction,
> + IndexScanBatch scanBatch,
> + BatchRingItemPos *scanPos,
> + bool *all_visible)
> +{
> + HeapBatchData *hbatch;
> +
> + pgstat_count_index_tuples(scan->indexRelation, 1);
The asymmetry of where pgstat_count_index_tuples() is called (i.e. here for
amgetbatch(), in index_getnext_tid() for amgettuple) still seems weird to me.
> + /*
> + * Set visibility info for a range of items, in scan order.
> + *
> + * Note: visibilitymap_get_status does not lock the visibility map buffer,
> + * so the result could be slightly stale. See the "Memory ordering
> + * effects" discussion above visibilitymap_get_status for an explanation
> + * of why this is okay.
> + */
> + if (ScanDirectionIsForward(direction))
> + {
> + int lastSetItem = Min(batch->lastItem,
> + posItem + hscan->xs_vm_items - 1);
> +
> + for (int setItem = posItem; setItem <= lastSetItem; setItem++)
> + {
> + ItemPointer tid = &batch->items[setItem].tableTid;
> + BlockNumber heapblkno = ItemPointerGetBlockNumber(tid);
> + uint8 flags;
> +
> + if (heapblkno == curvmheapblkno)
> + {
> + hbatch->visInfo[setItem] = curvmheapblkflags;
> + continue;
> + }
> +
> + flags = HEAP_BATCH_VIS_CHECKED;
> + if (VM_ALL_VISIBLE(scan->heapRelation, heapblkno, &hscan->xs_vmbuffer))
> + flags |= HEAP_BATCH_VIS_ALL_VISIBLE;
> +
> + hbatch->visInfo[setItem] = curvmheapblkflags = flags;
> + curvmheapblkno = heapblkno;
> + }
> +
> + allbatchitemsvisible = lastSetItem >= batch->lastItem &&
> + (posItem == batch->firstItem ||
> + (hbatch->visInfo[batch->firstItem] & HEAP_BATCH_VIS_CHECKED));
> + }
> + else
> + {
> + int lastSetItem = Max(batch->firstItem,
> + posItem - hscan->xs_vm_items + 1);
> +
> + for (int setItem = posItem; setItem >= lastSetItem; setItem--)
> + {
> + ItemPointer tid = &batch->items[setItem].tableTid;
> + BlockNumber heapblkno = ItemPointerGetBlockNumber(tid);
> + uint8 flags;
> +
> + if (heapblkno == curvmheapblkno)
> + {
> + hbatch->visInfo[setItem] = curvmheapblkflags;
> + continue;
> + }
> +
> + flags = HEAP_BATCH_VIS_CHECKED;
> + if (VM_ALL_VISIBLE(scan->heapRelation, heapblkno, &hscan->xs_vmbuffer))
> + flags |= HEAP_BATCH_VIS_ALL_VISIBLE;
> +
> + hbatch->visInfo[setItem] = curvmheapblkflags = flags;
> + curvmheapblkno = heapblkno;
> + }
> +
> + allbatchitemsvisible = lastSetItem <= batch->firstItem &&
> + (posItem == batch->lastItem ||
> + (hbatch->visInfo[batch->lastItem] & HEAP_BATCH_VIS_CHECKED));
> + }
Might be worth putting into a static inline.
> + /*
> + * It's safe to unguard the batch (via amunguardbatch) as soon as we've
> + * resolved the visibility status of all of its items (unless this is a
> + * non-MVCC scan)
> + */
> + if (allbatchitemsvisible)
> + {
> + Assert(hbatch->visInfo[batch->firstItem] & HEAP_BATCH_VIS_CHECKED);
> + Assert(hbatch->visInfo[batch->lastItem] & HEAP_BATCH_VIS_CHECKED);
> +
> + /*
> + * Note: nodeIndexonlyscan.c only supports MVCC snapshots, but we
> + * still cope with index-only scan callers with other snapshot types.
> + * This is certainly not unexpected; selfuncs.c performs index-only
> + * scans that use SnapshotNonVacuumable.
> + */
> + if (scan->MVCCScan)
> + tableam_util_unguard_batch(scan, batch);
> + }
> +
> + /*
> + * Else check visibility for twice as many items next time, or all items.
> + * We check all items in one go once we're passed the scan's first batch.
> + */
> + else if (hscan->xs_vm_items < (batch->lastItem - batch->firstItem))
> + hscan->xs_vm_items *= 2;
> + else
> + hscan->xs_vm_items = scan->maxitemsbatch;
> +}
Can'tt the *2 lead to computing a xs_vm_items bigger than scan->maxitemsbatch?
> +
> +/*
> + * Free resources at end of a batch index scan.
> + *
> + * Called by table AM when an index scan is ending, right before the owning
> + * scan descriptor goes away. Cleans up all batch related resources.
> + */
> +void
> +tableam_util_batchscan_end(IndexScanDesc scan)
> +{
> + /* Free all remaining loaded batches (even markBatch), bypassing cache */
> + tableam_util_batchscan_reset(scan, true);
> +
> + for (int i = 0; i < INDEX_SCAN_CACHE_BATCHES; i++)
> + {
> + IndexScanBatch cached = scan->batchcache[i];
> +
> + if (cached == NULL)
> + continue;
> +
> + if (cached->deadItems)
> + pfree(cached->deadItems);
> + pfree(index_scan_batch_base(scan, cached));
I'd set scan->batchcache[i] to NULL after freeing.
> +tableam_util_batchscan_mark_pos(IndexScanDesc scan)
> +{
> + BatchRingBuffer *batchringbuf = &scan->batchringbuf;
> + BatchRingItemPos *scanPos = &scan->batchringbuf.scanPos;
> + BatchRingItemPos *markPos = &batchringbuf->markPos;
> + IndexScanBatch scanBatch = index_scan_batch(scan, scanPos->batch);
> + IndexScanBatch markBatch = batchringbuf->markBatch;
> + bool freeMarkBatch;
> +
> + Assert(scan->MVCCScan);
> +
> + /*
> + * Free the previous mark batch (if any) -- but only if it isn't our
> + * scanBatch (defensively make sure that markBatch isn't some later
> + * still-needed batch, too)
> + */
> + if (!markBatch || markBatch == scanBatch)
> + {
> + /* Definitely no markBatch that we should free now */
> + freeMarkBatch = false;
> + }
> + else if (likely(!index_scan_batch_loaded(scan, markPos->batch)))
> + {
> + /* Definitely have a no-longer-loaded markBatch to free */
> + freeMarkBatch = true;
> + }
> + else
> + {
> + /*
> + * index_scan_batch_loaded indicates that markPos->batch is loaded,
> + * but after uint8 overflow a stale batch offset can alias a
Maybe s/overflow/wraparound/?
> + * currently-loaded range (false positive). Confirm by checking
> + * whether the batch pointer in markPos->batch's slot still matches.
> + */
> + freeMarkBatch = (index_scan_batch(scan, markPos->batch) != markBatch);
Perhaps this should be be integrated into index_scan_batch_loaded()?
> +/*
> + * Restore mark to scanPos position
This sounds a bit like the mark is restored into the position of the scan pos,
which obviously makes no sense.
> +/*
> + * Free a batch, optionally caching it for reuse.
> + *
> + * tableam_util_free_batch implementation function. We split out the
> + * implementation like this because we don't want to give external table AM
> + * callers the option of passing allow_cache=false.
> + *
> + * When allow_cache is true, we try to store the batch in the scan's batch
> + * cache for later reuse. When allow_cache is false (typically because the
> + * scan is shutting down), we pfree the caller's batch unconditionally.
> + */
> +static void
> +batch_free(IndexScanDesc scan, IndexScanBatch batch, bool allow_cache)
> +{
> + Assert(!(scan->batchImmediateUnguard && batch->isGuarded));
> + Assert(batch->isGuarded || scan->MVCCScan);
> +
> + /* don't free caller's batch if it is scan's current markBatch */
> + if (batch == scan->batchringbuf.markBatch)
> + return;
> +
> + /* Drop TID recycling interlock via amunguardbatch as needed */
> + if (!scan->batchImmediateUnguard && batch->isGuarded)
> + tableam_util_unguard_batch(scan, batch);
> +
> + /*
> + * Let the index AM set LP_DEAD bits in the index page, if applicable.
> + *
> + * batch.deadItems[] is now in whatever order the scan returned items in.
> + * We might have even saved the same item/TID twice.
> + *
> + * Sort and unique-ify deadItems[]. That way the index AM can safely
> + * assume that items will always be in their original index page order.
> + */
> + if (batch->numDead > 0 &&
> + scan->indexRelation->rd_indam->amkillitemsbatch != NULL)
> + {
> + if (batch->numDead > 1)
> + {
> + qsort(batch->deadItems, batch->numDead, sizeof(int),
> + batch_compare_int);
> + batch->numDead = qunique(batch->deadItems, batch->numDead,
> + sizeof(int), batch_compare_int);
> + }
> +
> + scan->indexRelation->rd_indam->amkillitemsbatch(scan, batch);
> + }
Based on the name or function header one sure wouldn't think this will call
indexam callbacks...
> /* ----------------
> * index_restrpos - restore a scan position
> *
> - * NOTE: this only restores the internal scan state of the index AM. See
> + * NOTE: this only restores the batch positional state of the table AM. See
> * comments for ExecRestrPos().
> *
> * NOTE: For heap, in the presence of HOT chains, mark/restore only works
> * correctly if the scan's snapshot is MVCC-safe; that ensures that there's at
> * most one returnable tuple in each HOT chain, and so restoring the prior
> - * state at the granularity of the index AM is sufficient. Since the only
> - * current user of mark/restore functionality is nodeMergejoin.c, this
> - * effectively means that merge-join plans only work for MVCC snapshots. This
> - * could be fixed if necessary, but for now it seems unimportant.
> + * state at the scan item granularity is sufficient. Since the only current
> + * user of mark/restore functionality is nodeMergejoin.c, this effectively
> + * means that merge-join plans only work for MVCC snapshots. This could be
> + * fixed if necessary, but for now it seems unimportant.
> * ----------------
> */
> void
> @@ -459,16 +480,11 @@ index_restrpos(IndexScanDesc scan)
> Assert(IsMVCCLikeSnapshot(scan->xs_snapshot));
>
> SCAN_CHECKS;
> - CHECK_SCAN_PROCEDURE(amrestrpos);
> + CHECK_SCAN_PROCEDURE(amgetbatch);
>
> - /* reset table AM state for restoring the marked position */
> - if (scan->xs_heapfetch)
> - table_index_fetch_reset(scan);
> -
> - scan->kill_prior_tuple = false; /* for safety */
> scan->xs_heap_continue = false;
>
> - scan->indexRelation->rd_indam->amrestrpos(scan);
> + table_index_fetch_restrpos(scan);
> }
Did we loose that kill_prior_tuple? Or is there a reason it's not necessary
anymore? Too tired to think through this.
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2026-04-04 05:33:32 | Re: index prefetching |
| Previous Message | Peter Geoghegan | 2026-04-04 04:16:37 | Re: index prefetching |