Re: Batching in executor

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Daniil Davydov <3danissimo(at)gmail(dot)com>
Cc: cca5507 <2624345507(at)qq(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: Batching in executor
Date: 2026-01-27 03:00:13
Message-ID: CA+HiwqGufj46_4kT6z5nQrCOxp3rcMDR8fn_t33n2pDKokNRQg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Jan 26, 2026 at 6:34 PM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Mon, Dec 22, 2025 at 6:46 PM cca5507 <2624345507(at)qq(dot)com> wrote:
> >
> > Some comments for v4:
> >
>
> Agree with your (1)-(4) comments.
>
> > 5) heapgettup_pagemode_batch()
> > If the scan key filters out all tuples on a page, we may return 0 before reaching the end of scan, right?
> >
>
> Yes. I think that we should advance to the next page if "nout == 0"
> at the end of walking through the rs_vistuples.

Next version (v5) does it like that.

> > 6) heap_begin_batch()
> > ```
> > hb = palloc(sizeof(HeapBatch));
> > hb->tupdata = palloc(sizeof(HeapTupleData) * maxitems);
> > ```
> > Can we just use one palloc() for cache-friendly?
> >
>
> Actually, we are using memory context when calling the palloc function.
> I.e. in the general case it will not cause memory allocation. But of course
> there is no guarantee for it. I saw a lot of places in the code where we
> are calling the palloc function several times in a row, so I guess that
> this is OK.
>
> If you will decide to leave these palloc calls, I suggest using the
> palloc_object/palloc_array functions.

I think combining those individual pallocs into one is a good idea, so
v5 does it like that.

> A few other comments on 0001 patch:
>
> 1)
> + void *(*scan_begin_batch)(TableScanDesc sscan, int maxitems);
> Is it syntactically correct?

Yes, it compiles fine. Though I'm considering changing the return type
to a struct with common fields (like nitems) so callers can access
them directly without callback indirection. Maybe call it TAMBatch or
something.

> 2)
> /* Initialize static fields of HeapTupleData. Row bodies remain on page. */
> relid = RelationGetRelid(sscan->rs_rd);
> for (int i = 0; i < maxitems; i++)
> hb->tupdata[i].t_tableOid = relid;
>
> Is it really necessary? I see that we are setting this field inside the
> heapgettup_pagemode_batch function.

It's intentional -- by initializing t_tableOid once in
heap_begin_batch, we can avoid setting it repeatedly for every tuple
in heapgettup_pagemode_batch(). Though you are correct to point out
the redundant assignment in heapgettup_pagemode_batch(); I'll change
it to an Assert instead. The relid doesn't change during the scan.

> A few comment on 0002 patch:
>
> 1)
> I guess that you should rebase your patches on the current master, because
> the second patch doesn't apply.

Yep, will do.

> 2)
> Maybe we can use tuplestore for tuples stored in TupleBatch? It is just a
> proposal - I didn't check this idea carefully.

TupleBatch is designed to be lightweight -- it holds an array of
TupleTableSlot pointers, not the tuple data itself. The slots
reference tuples that remain in the AM's buffer (no copy). Using
tuplestore would require materializing tuples, adding overhead we're
trying to avoid.

--
Thanks, Amit Langote

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Fujii Masao 2026-01-27 02:58:01 Re: Is abort() still needed in WalSndShutdown()?