Re: BitmapHeapScan streaming read user and prelim refactoring

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2024-03-19 12:33:35
Message-ID: 5a172d1e-d69c-409a-b1fa-6521214c81c2@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18/03/2024 17:19, Melanie Plageman wrote:
> I've attached v7 rebased over this commit.

Thanks!

> v7-0001-BitmapHeapScan-begin-scan-after-bitmap-creation.patch

If we delayed table_beginscan_bm() call further, after starting the TBM
iterator, we could skip it altogether when the iterator is empty.

That's a further improvement, doesn't need to be part of this patch set.
Just caught my eye while reading this.

> v7-0003-Push-BitmapHeapScan-skip-fetch-optimization-into-.patch

I suggest to avoid the double negative with SO_CAN_SKIP_FETCH, and call
the flag e.g. SO_NEED_TUPLE.

As yet another preliminary patch before the streaming read API, it would
be nice to move the prefetching code to heapam.c too.

What's the point of having separate table_scan_bitmap_next_block() and
table_scan_bitmap_next_tuple() functions anymore? The AM owns the TBM
iterator now. The executor node updates the lossy/exact page counts, but
that's the only per-page thing it does now.

> /*
> * If this is the first scan of the underlying table, create the table
> * scan descriptor and begin the scan.
> */
> if (!scan)
> {
> uint32 extra_flags = 0;
>
> /*
> * We can potentially skip fetching heap pages if we do not need
> * any columns of the table, either for checking non-indexable
> * quals or for returning data. This test is a bit simplistic, as
> * it checks the stronger condition that there's no qual or return
> * tlist at all. But in most cases it's probably not worth working
> * harder than that.
> */
> if (node->ss.ps.plan->qual == NIL && node->ss.ps.plan->targetlist == NIL)
> extra_flags |= SO_CAN_SKIP_FETCH;
>
> scan = node->ss.ss_currentScanDesc = table_beginscan_bm(
> node->ss.ss_currentRelation,
> node->ss.ps.state->es_snapshot,
> 0,
> NULL,
> extra_flags);
> }
>
> scan->tbmiterator = tbmiterator;
> scan->shared_tbmiterator = shared_tbmiterator;

How about passing the iterator as an argument to table_beginscan_bm()?
You'd then need some other function to change the iterator on rescan,
though. Not sure what exactly to do here, but feels that this part of
the API is not fully thought-out. Needs comments at least, to explain
who sets tbmiterator / shared_tbmiterator and when. For comparison, for
a TID scan there's a separate scan_set_tidrange() table AM function.
Maybe follow that example and introduce scan_set_tbm_iterator().

It's bit awkward to have separate tbmiterator and shared_tbmiterator
fields. Could regular and shared iterators be merged, or wrapped under a
common interface?

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-03-19 12:41:35 Re: Built-in CTYPE provider
Previous Message Alexander Korotkov 2024-03-19 11:58:36 Re: [HACKERS] make async slave to wait for lsn to be replayed