Re: Parallel bitmap heap scan

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel bitmap heap scan
Date: 2017-02-11 06:41:30
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Wed, Feb 8, 2017 at 8:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

Thanks for the detailed review and your valuable feedback.

> I think it would be useful to break the remaining patch into two
> parts, one that enhances the tidbitmap.c API and another that uses
> that to implement Parallel Bitmap Heap Scan.

I have done that.

> There are several cosmetic problems here. You may have noticed that
> all function prototypes in PostgreSQL header files are explicitly
> declared extern; yours should be, too. Also, there is extra
> whitespace before some of the variable names here, like
> "ParallelTBMInfo * tbminfo" instead of "ParallelTBMInfo *tbminfo". If
> that is due to pgindent, the solution is to add the typedef names to
> your local typedef list. Also, tbm_restore_shared_members doesn't
> actually exist.


> 1. Add an additional argument to tbm_create(), dsa_area *dsa. If it's
> NULL, we allocate a backend-private memory for the hash entries as
> normal. If it's true, we use the dsa_area to store the hash entries,
> using the infrastructure added by your 0002 and revised in
> c3c4f6e1740be256178cd7551d5b8a7677159b74. (You can use a flag in the
> BitmapIndexScan and BitmapOr to decide whether to pass NULL or
> es_query_dsa.)

> 2. Add a new function dsa_pointer tbm_prepare_shared_iterate(TIDBitmap
> *tbm) which allocates shared iteration arrays using the DSA passed to
> tbm_create() and returns a dsa_pointer to the result. Arrange this so
> that if it's called more than once, each call returns a separate
> iterator, so that you can call it once to get the main iterator and a
> second time for the prefetch iterator, but have both of those point to
> the same underlying iteration arrays.
> 3. Add a new function TBMSharedIterator
> *tbm_attach_shared_iterate(dsa_area *dsa, dsa_pointer) which is called
> once per backend and gets passed the dsa_pointer from the previous
> step.
> 4. Add a new function TBMIterateResult
> *tbm_shared_iterate(TBMSharedIterator *) to fetch the next result.

I have tried to get these three API's as you explained with one
change. In tbm_attach_shared_iterate I need to pass TBM also because
each worker will create their own copy of TBM. Those workers need to
get the TBM-related information from the shared location. Even though
I try to access some of the information like chunk, npages from
directly shared location, but some other information like base pointer
to dsa element, RelptrPagetableEntry etc. should be local to each
worker (after conversion from dsa pointer to local pointer).

> As compared with your proposal, this involves only 3 functions instead
> of 7 (plus one new argument to another function), and I think it's a
> lot clearer what those functions are actually doing. You don't need
> tbm_estimate_parallel_tbminfo() any more because the data being passed
> from one backend to another is precisely a dsa_pointer, and the bitmap
> scan can just leave space for that in its own estimator function. You
> don't need tbm_update_shared_members() separately from
> tbm_begin_shared_iterate() separately from tbm_init_shared_iterator()
> because tbm_prepare_shared_iterate() can do all that stuff. You don't
> need tbm_set_parallel() because the additional argument to
> tbm_create() takes care of that need.


> The way you've got ParallelTBMInfo set up right now doesn't respect
> the separation of concerns between nodeBitmapHeapscan.c and
> tidbitmap.c properly. tidbitmap.c shouldn't know that the caller
> intends on creating two iterators, one of which is for prefetching.
> The design above hopefully addresses that: each call to
> tbm_prepare_shared_iterate returns a dsa_pointer to a separate chunk
> of shared iterator state, but tidbitmap.c does not need to know how
> many times that will be called.

> Apart from the above, this patch will need a rebase over today's
> commits,

and please make sure all functions you add have high-quality
> header comments.

I tried my best, please check the latest patch (0001).

Apart from those, there are some more changes.
1. I have moved the dsa_pointer and dsa_area declaration from dsa.h to
postgres .h, an independent patch is attached for the same. Because we
need to declare function headers with dsa_pointer in tidbitmap.h, but
tidbitmap.h also used in FRONTEND, therefore, this can not include
dsa.h (as it contains atomics.h).

2. I noticed there was one defect in my code related to handling the
TBM_ONE_PAGE, In initial version, there was no problem, but it got
introduced somewhere in intermediate versions.

I have fixed the same. There were two option to do that

1. Don’t switch to TBM_ONE_PAGE in case of parallel mode (ideally
this can not happen only worst estimation can get us there) and
directly got to TBM_HASH
2. In shared information keep space for sharing a PagetableEntry.

I have implemented 2nd (in the initial versions I implemented with 1st).

Note: Patch is also rebased on top of guc_parallel_index_scan_v1.patch
from Parallel Index Scan thread[1]


Dilip Kumar

Attachment Content-Type Size
move-dsa-to-postgres-h.patch application/octet-stream 3.2 KB
0001-tidbitmap-support-shared-v1.patch application/octet-stream 19.1 KB
0002-parallel-bitmap-heapscan-v1.patch application/octet-stream 36.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-02-11 07:43:12 Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Previous Message Pavel Stehule 2017-02-11 06:03:27 Re: possibility to specify template database for pg_regress