Re: Parallel bitmap index scan

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel bitmap index scan
Date: 2020-08-17 14:11:57
Message-ID: CALj2ACUEHw32XzcGO77XwOmhit_Q4Xv+cLmjk2jTjmT2LtTdnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 26, 2020 at 6:43 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> I would like to propose a patch for enabling the parallelism for the
> bitmap index scan path.
>
> Background:
> Currently, we support only a parallel bitmap heap scan path. Therein,
> the underlying bitmap index scan is done by a single worker called the
> leader. The leader creates a bitmap in shared memory and once the
> bitmap is ready it creates a shared iterator and after that, all the
> workers process the shared iterator and scan the heap in parallel.
> While analyzing the TPCH plan we have observed that some of the
> queries are spending significant time in preparing the bitmap. So the
> idea of this patch is to use the parallel index scan for preparing the
> underlying bitmap in parallel.
>
> Design:
> If underlying index AM supports the parallel path (currently only
> BTREE support it), then we will create a parallel bitmap heap scan
> path on top of the parallel bitmap index scan path. So the idea of
> this patch is that each worker will do the parallel index scan and
> generate their part of the bitmap. And, we will create a barrier so
> that we can not start preparing the shared iterator until all the
> worker is ready with their bitmap. The first worker, which is ready
> with the bitmap will keep a copy of its TBM and the page table in the
> shared memory. And, all the subsequent workers will merge their TBM
> with the shared TBM. Once all the TBM are merged we will get one
> common shared TBM and after that stage, the worker can continue. The
> remaining part is the same, basically, again one worker will scan the
> shared TBM and prepare the shared iterator and once it is ready all
> the workers will jointly scan the heap in parallel using shared
> iterator.
>

Though I have not looked at the patch or code for the existing
parallel bitmap heap scan, one point keeps bugging in my mind. I may
be utterly wrong or my question may be so silly, anyways I would like
to ask here:

From the above design: each parallel worker creates partial bitmaps
for the index data that they looked at. Why should they merge these
bitmaps to a single bitmap in shared memory? Why can't each parallel
worker do a bitmap heap scan using the partial bitmaps they built
during it's bitmap index scan and emit qualified tuples/rows so that
the gather node can collect them? There may not be even lock
contention as bitmap heap scan takes read locks for the heap
pages/tuples.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-08-17 14:12:39 Re: Improve planner cost estimations for alternative subplans
Previous Message Bharath Rupireddy 2020-08-17 14:02:04 Re: jsonb, collection & postgres_fdw