Re: BitmapHeapScan streaming read user and prelim refactoring

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2024-03-13 13:34:15
Message-ID: 5f3b9d59-0f43-419d-80ca-6d04c07cf61a@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(Adding Dilip, the original author of the parallel bitmap heap scan
patch all those years ago, in case you remember anything about the
snapshot stuff below.)

On 27/02/2024 16:22, Melanie Plageman wrote:
> On Mon, Feb 26, 2024 at 08:50:28PM -0500, Melanie Plageman wrote:
>> On Fri, Feb 16, 2024 at 12:35:59PM -0500, Melanie Plageman wrote:
>>> In the attached v3, I've reordered the commits, updated some errant
>>> comments, and improved the commit messages.
>>>
>>> I've also made some updates to the TIDBitmap API that seem like a
>>> clarity improvement to the API in general. These also reduce the diff
>>> for GIN when separating the TBMIterateResult from the
>>> TBM[Shared]Iterator. And these TIDBitmap API changes are now all in
>>> their own commits (previously those were in the same commit as adding
>>> the BitmapHeapScan streaming read user).
>>>
>>> The three outstanding issues I see in the patch set are:
>>> 1) the lossy and exact page counters issue described in my previous
>>> email
>>
>> I've resolved this. I added a new patch to the set which starts counting
>> even pages with no visible tuples toward lossy and exact pages. After an
>> off-list conversation with Andres, it seems that this omission in master
>> may not have been intentional.
>>
>> Once we have only two types of pages to differentiate between (lossy and
>> exact [no longer have to care about "has no visible tuples"]), it is
>> easy enough to pass a "lossy" boolean paramater to
>> table_scan_bitmap_next_block(). I've done this in the attached v4.
>
> Thomas posted a new version of the Streaming Read API [1], so here is a
> rebased v5. This should make it easier to review as it can be applied on
> top of master.

Lots of discussion happening on the performance results but it seems
that there is no performance impact with the preliminary patches up to
v5-0013-Streaming-Read-API.patch. I'm focusing purely on those
preliminary patches now, because I think they're worthwhile cleanups
independent of the streaming read API.

Andres already commented on the snapshot stuff on an earlier patch
version, and that's much nicer with this version. However, I don't
understand why a parallel bitmap heap scan needs to do anything at all
with the snapshot, even before these patches. The parallel worker
infrastructure already passes the active snapshot from the leader to the
parallel worker. Why does bitmap heap scan code need to do that too?

I disabled that with:

> --- a/src/backend/executor/nodeBitmapHeapscan.c
> +++ b/src/backend/executor/nodeBitmapHeapscan.c
> @@ -874,7 +874,9 @@ ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
> pstate = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
> node->pstate = pstate;
>
> +#if 0
> node->worker_snapshot = RestoreSnapshot(pstate->phs_snapshot_data);
> Assert(IsMVCCSnapshot(node->worker_snapshot));
> RegisterSnapshot(node->worker_snapshot);
> +#endif
> }

and ran "make check-world". All the tests passed. To be even more sure,
I added some code there to assert that the serialized version of
node->ss.ps.state->es_snapshot is equal to pstate->phs_snapshot_data,
and all the tests passed with that too.

I propose that we just remove the code in BitmapHeapScan to serialize
the snapshot, per attached patch.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
0001-Remove-redundant-snapshot-copying-from-parallel-lead.patch text/x-patch 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2024-03-13 13:40:38 Re: type cache cleanup improvements
Previous Message Peter Eisentraut 2024-03-13 13:33:52 Re: Using the %m printf format more